Applied  Linear  Regression  Models 


John  Neter 

University  of  Georgia 

William  Wasserman 

Syracuse  University 

Michael  H.  Kutner 

Emory  University 


ITI 

/  D  R,CHARD  D-  IRWIN,  INC. 

Lmm—J  Homewood,  Illinois  60430 


©  RICHARD  D.  IRWIN,  INC.,  1983 

All  rights  reserved.  No  part  of  this  publication  may  be 
reproduced,  stored  in  a  retrieval  system,  or  transmitted, 
in  any  form  or  by  any  means,  electronic,  mechanical, 
photocopying,  recording,  or  otherwise,  without  the  prior 
written  permission  of  the  publisher. 


ISBN  0-256-02547-9 

Library  of  Congress  Catalog  Card  No.  82-82121 
Printed  in  the  United  States  of  America 


1234567890 MP 09876543 


To 

Dorothy,  Ron,  David 
Cathy,  Christopher,  Timothy, 
Randall,  Erin,  Fiona 
Nancy,  Michelle,  Allison 


Preface 


Applied  Linear  Regression  Models  is  a  revision  of  the  regression  portion  of 
Applied  Linear  Statistical  Models.  The  publication  of  a  separate  book  which 
offers  a  revised  and  updated  treatment  of  regression  models  fills  an  important 
need  in  view  of  the  many  significant  developments  in  regression  analysis  in 
recent  years.  The  remainder  of  Applied  Linear  Statistical  Models  is  now  being 
revised  and  will  be  published  upon  completion  of  the  revision. 

Linear  regression  models  are  widely  used  today  in  business  administration, 
economics,  and  the  social,  health,  and  biological  sciences.  Successful  applica¬ 
tion  of  these  models  requires  a  sound  understanding  of  both  the  underlying  the¬ 
ory  and  the  practical  problems  which  are  encountered  in  using  the  models  in 
real-life  situations.  While  Applied  Linear  Regression  Models  is  basically  an  ap¬ 
plied  book,  it  seeks  to  blend  theory  and  applications  effectively,  avoiding  the 
extremes  of  presenting  theory  in  isolation  and  of  giving  elements  of  applications 
without  the  needed  understanding  of  the  theoretical  foundations. 

Applied  Linear  Regression  Models  differs  from  Applied  Linear  Statistical 
Models  in  a  number  of  important  respects. 

1.  We  have  added  some  important  new  topics,  including  detection  of  multi- 
collinearity,  ridge  regression,  detection  of  influential  observations,  and  nonlinear 
regression.  In  recent  years,  noteworthy  new  developments  in  the  detection  of 
multicollinearity  and  influential  observations  have  taken  place,  and  a  current  text 
in  regression  analysis  needs  to  cover  these  topics  adequately.  In  addition,  we 
have  added  some  specialized  topics ,  such  as  P-values  of  statistical  tests  and  the 
method  of  least  absolute  deviations. 

2.  We  have  also  reorganized  and  expanded  a  number  of  topics,  including 

vii 


viii  /  Preface 


weighted  regression,  selection  of  independent  variables,  and  normal  probability 
plots.  At  the  same  time,  we  have  made  extensive  revisions  of  other  materials  on 
the  basis  of  classroom  experience  to  improve  the  clarity  of  the  presentation. 

3 .  The  scope  of  the  examples  has  been  expanded  to  include  applications  from 
the  health  and  biological  sciences,  in  addition  to  applications  from  management, 
economics,  and  the  social  sciences.  In  all  cases,  an  application  can  be  readily 
understood  by  the  general  reader,  regardless  of  background. 

4.  We  have  introduced  a  number  of  computer-generated  plots  to  demonstrate 
the  usefulness  of  computer  graphics  in  regression  analysis. 

5.  We  have  added  two  extensive  real-world  data  sets  that  can  be  employed  in 
a  variety  of  ways. 

6.  Finally,  we  have  substantially  expanded  the  problem  materials  at  the  ends 
of  the  chapters  and  have  grouped  them  into  three  categories,  namely  Problems, 
Exercises,  and  Projects.  The  Problems  category  includes  basic  problems  and 
questions,  the  Exercises  category  includes  conceptual  and  theoretical  questions, 
and  the  Projects  category  includes  problems  utilizing  large  data  sets  and/or  in¬ 
volving  extensive  calculations  and  analysis. 

We  have  included  in  this  book  not  only  the  more  conventional  topics  in  regres¬ 
sion  but  also  have  taken  up  topics  that  are  frequently  slighted  though  important  in 
practice.  Thus,  we  devote  a  full  chapter  to  indicator  variables,  covering  both 
dependent  and  independent  indicator  variables.  Another  chapter  takes  up  com¬ 
puter-assisted  selection  procedures  for  obtaining  a  “best”  set  of  independent 
variables  to  be  employed  in  the  regression  model.  The  use  of  residual  analysis  for 
examining  the  aptness  of  the  model  is  a  recurring  theme  throughout  this  book.  So 
is  the  use  of  remedial  measures  that  may  be  helpful  when  the  model  is  not 
appropriate.  In  the  analysis  of  the  results  of  a  study,  we  emphasize  the  use  of 
estimation  procedures,  rather  than  tests,  because  estimation  is  often  more  mean¬ 
ingful  in  practice.  Also,  since  practical  problems  seldom  are  concerned  with  a 
single  estimate,  we  stress  the  use  of  simultaneous  estimation  procedures. 

Theoretical  ideas  are  presented  to  the  degree  needed  for  good  understanding  in 
making  sound  applications.  Proofs  are  given  in  those  instances  where  we  feel 
they  serve  to  demonstrate  an  important  method  of  approach.  Emphasis  is  placed 
on  a  thorough  understanding  of  the  models,  particularly  the  meaning  of  the 
model  parameters,  since  such  understanding  is  basic  to  proper  applications.  A 
wide  variety  of  case  examples  is  presented  to  illustrate  the  use  of  the  theoretical 
principles,  to  show  the  great  diversity  of  applications  of  regression  models,  and 
to  demonstrate  how  analyses  are  carried  out  for  different  problems. 

We  use  “Notes”  and  “Comments”  sections  in  each  chapter  to  present  addi¬ 
tional  discussion  and  matters  related  to  the  mainstream  of  development.  In  this 
way,  the  basic  ideas  in  a  chapter  are  presented  concisely  and  without  distraction. 
Similarly,  optional  “Topics”  chapters  supplement  chapters  containing  the  main 
development  and  present  a  variety  of  additional  topics  that  in  most  cases  can  be 
omitted  without  loss  of  continuity. 

Applications  of  regression  models  frequently  require  extensive  computations. 
We  take  the  position  that  a  computer  is  available  in  most  applied  work.  Further, 


Preface  /  ix 


almost  every  computer  user  has  access  to  program  packages  for  regression  analy¬ 
sis.  Hence,  we  explain  the  basic  mathematical  steps  in  fitting  a  regression  model 
but  do  not  dwell  on  computational  details.  This  approach  permits  us  to  avoid 
many  complex  formulas  and  enables  us  to  focus  on  basic  principles.  We  make 
extensive  use  in  this  text  of  computer  capabilities  for  performing  computations 
and  illustrate  a  variety  of  computer  printouts  and  explain  how  these  are  used  for 
analysis. 

A  selection  of  problems  is  provided  at  the  end  of  each  chapter  (excepting 
Chapter  1).  Here  the  reader  can  reinforce  his  or  her  understanding  of  the  method¬ 
ology  and  use  the  concepts  learned  to  analyze  data.  We  have  been  careful  to 
supply  data-analysis  problems  that  typify  genuine  applications.  In  most  problems 
the  calculations  are  best  handled  on  a  calculator  or  computer,  and  we  urge  that 
this  avenue  be  used  when  possible. 

We  assume  that  the  reader  of  Applied  Linear  Regression  Models  has  had  an 
introductory  course  in  statistical  inference,  covering  the  material  outlined  in 
Chapter  1 .  Should  some  gaps  in  the  reader’s  background  exist,  he  or  she  can  read 
the  relevant  portions  of  an  introductory  text,  or  the  instructor  of  the  class  may  use 
supplemental  materials  for  covering  the  missing  segments.  Chapter  1  is  primarily 
intended  as  a  reference  chapter  of  basic  statistical  results  for  continuing  use  as  the 
reader  progresses  through  the  book. 

Calculus  is  not  required  for  reading  Applied  Linear  Regression  Models.  In  a 
number  of  instances  we  use  calculus  to  demonstrate  how  some  important  results 
are  obtained,  but  these  demonstrations  are  confined  to  supplementary  comments 
or  notes  and  can  be  omitted  without  any  loss  of  continuity.  Readers  who  do  know 
calculus  will  find  these  comments  and  notes  in  natural  sequence  so  that  the 
benefits  of  the  mathematical  developments  are  obtained  in  their  immediate  con¬ 
text.  Some  basic  elements  of  matrix  algebra  are  needed  for  multiple  regression. 
Chapter  6  introduces  these  elements  of  matrix  algebra  in  the  context  of  simple 
regression  for  easy  learning. 

Applied  Linear  Regression  Models  is  intended  for  use  in  undergraduate  or 
graduate  courses  in  regression  analysis  and  in  second  courses  in  applied  statis¬ 
tics.  The  extent  to  which  material  presented  in  this  text  is  used  in  a  particular 
course  depends  upon  the  amount  of  time  available  and  the  objectives  of  the 
course.  The  basic  elements  of  regression  are  covered  in  Chapters  2,  3,  4,  5 
(Sections  5. 1-5.4  only),  6,  7,  8,  and  12.  Chapters  9,  10,  11,  13,  14,  and  15  can 
be  covered  as  time  permits  and  interests  dictate. 

This  book  can  also  be  used  for  self-study  by  persons  engaged  in  the  fields  of 
business  administration,  economics,  and  the  social,  health,  and  biological  sci¬ 
ences  who  desire  to  obtain  competence  in  the  application  of  regression  models. 

A  book  such  as  this  cannot  be  written  without  substantial  assistance  from 
others.  We  are  indebted  to  the  many  contributors  who  have  developed  the  theory 
and  practice  discussed  in  this  book.  We  also  would  like  to  acknowledge  apprecia¬ 
tion  to  our  students  who  helped  us  in  a  variety  of  ways  to  fashion  the  method  of 
presentation  contained  herein.  We  are  grateful  to  the  many  users  of  Applied 
Linear  Statistical  Models  who  provided  us  with  comments  and  suggestions  based 


x  /  Preface 


on  their  teaching  with  this  text.  We  are  also  indebted  to  Professors  James  E. 
Holstein,  University  of  Missouri,  and  David  L.  Sherry,  University  of  West  Flor¬ 
ida,  who  carefully  reviewed  Applied  Linear  Statistical  Models  to  provide  sugges¬ 
tions  for  this  volume.  Robert  L.  Vogel  assisted  us  diligently  in  the  checking  of 
the  manuscript,  for  which  we  are  most  appreciative.  Michael  J.  Lynn  prepared 
the  computer-generated  plots  using  a  Zeta  model  3600  plotter,  and  George 
Cotsonis  and  Shizuki  Yamamoto  assisted  us  in  checking  of  calculations  and  in 
other  ways.  Almost  all  of  the  typing  was  done  by  Rebecca  Baggett,  who  ably 
handled  the  preparation  of  a  difficult  manuscript.  We  are  most  grateful  to  all  of 
these  persons  for  their  help  and  assistance. 

Finally,  our  families  bore  patiently  the  pressures  caused  by  our  commitment  to 
complete  this  revision.  We  are  appreciative  of  their  understanding. 

John  Neter 
William  Wasserman 
Michael  H.  Kutner 


Contents 


1.  Some  basic  results  in  probability  and  statistics,  1 

1.1  Summation  and  product  operators,  1 

1.2  Probability,  2 

1.3  Random  variables,  3 

1.4  Normal  probability  distribution  and  related  distributions,  6 

1.5  Statistical  estimation,  9 

1.6  Inferences  about  population  mean — Normal  population,  10 

1.7  Comparisons  of  two  population  means — Normal  populations,  13 

1.8  Inferences  about  population  variance — Normal  population,  16 

1.9  Comparisons  of  two  population  variances — Normal  populations,  17 

Part  I  Basic  regression  analysis,  21 

2.  Linear  regression  with  one  independent  variable,  23 

2.1  Relations  between  variables,  23 

2.2  Regression  models  and  their  uses,  26 

2.3  Regression  model  with  distribution  of  error  terms  unspecified,  31 

2.4  Estimation  of  regression  function,  35 

2.5  Estimation  of  error  terms  variance  cr2,  46 

2.6  Normal  error  regression  model,  48 

2.7  Computer  inputs  and  outputs,  51 

xi 


xii  /  Contents 


3.  Inferences  in  regression  analysis,  60 

3.1  Inferences  concerning  ,  60 

3.2  Inferences  concerning  j30,  68 

3.3  Some  considerations  on  making  inferences  concerning  j30  and  /3, ,  70 

3.4  Interval  estimation  of  E(Yh),  72 

3.5  Prediction  of  new  observation,  76 

3.6  Considerations  in  applying  regression  analysis,  82 

3.7  Case  when  X  is  random,  83 

3.8  Analysis  of  variance  approach  to  regression  analysis,  84 

3.9  Descriptive  measures  of  association  between  X  and  Y  in 

regression  model,  96 

3.10  Computer  output,  99 

4.  Aptness  of  model  and  remedial  measures,  109 

4.1  Residuals,  109 

4.2  Graphic  analysis  of  residuals,  111 

4.3  Tests  involving  residuals,  122 

4.4  F  test  for  lack  of  fit,  123 

4.5  Remedial  measures,  132 

4.6  Transformations,  134 

5.  Simultaneous  inferences  and  other  topics  in 

regression  analysis — I,  147 

5.1  Joint  estimation  of  (30  and  /3],  147 

5.2  Confidence  band  for  regression  line,  154 

5.3  Simultaneous  estimation  of  mean  responses,  157 

5.4  Simultaneous  prediction  intervals  for  new  observations,  159 

5.5  Regression  through  the  origin,  160 

5.6  Effect  of  measurement  errors,  164 

5.7  Weighted  least  squares,  167 

5.8  Inverse  predictions,  172 

5.9  Choice  of  X  levels,  175 


Part  II  General  regression  and  correlation  analysis,  183 

6.  Matrix  approach  to  simple  regression  analysis,  185 

6.1  Matrices,  185 

6.2  Matrix  addition  and  subtraction,  190 

6.3  Matrix  multiplication,  192 

6.4  Special  types  of  matrices,  196 


Contents  /  xiii 


6.5  Linear  dependence  and  rank  of  matrix,  199 

6.6  Inverse  of  a  matrix,  200 

6.7  Some  basic  theorems  for  matrices,  204 

6.8  Random  vectors  and  matrices,  205 

6.9  Simple  linear  regression  model  in  matrix  terms,  208 

6.10  Least  squares  estimation  of  regression  parameters,  210 

6.11  Analysis  of  variance  results,  212 

6.12  Inferences  in  regression  analysis,  216 

6.13  Weighted  least  squares,  219 

6.14  Residuals,  220 

7.  Multiple  regression— I,  226 

7.1  Multiple  regression  models,  226 

7.2  General  linear  regression  model  in  matrix  terms,  237 

7.3  Least  squares  estimators,  238 

7.4  Analysis  of  variance  results,  23b 

7.5  Inferences  about  regression  parameters,  242 

7.6  Inferences  about  mean  response,  244 

7.7  Predictions  of  new  observations,  246 

7.8  An  example — Multiple  regression  with  two  independent  variables,  247 

7.9  Standardized  regression  coefficients,  261 

7.10  Weighted  least  squares,  263 

8.  Multiple  regression — II,  271 

8.1  Multicollinearity  and  its  effects,  271 

8.2  Decomposition  of  SSR  into  extra  sums  of  squares,  282 

8.3  Coefficients  of  partial  determination,  286 

8.4  Testing  hypotheses  concerning  regression  coefficients  in 

multiple  regression,  289 

8.5  Matrix  formulation  of  general  linear  test,  293 

9.  Polynomial  regression,  300 

9.1  Polynomial  regression  models,  300 

9.2  Example  1 — One  independent  variable,  305 

9.3  Example  2 — Two  independent  variables,  313 

9.4  Estimating  the  maximum  or  minimum  of  a  quadratic  regression 

function,  317 

9.5  Some  further  comments  on  polynomial  regression,  319 

10.  Indicator  variables,  328 

10.1  One  independent  qualitative  variable,  328 

10.2  Model  containing  interaction  effects,  335 


xiv  /  Contents 


10.3  More  complex  models,  339 

10.4  Other  uses  of  independent  indicator  variables,  343 

10.5  Some  considerations  in  using  independent  indicator  variables,  351 

10.6  Dependent  indicator  variable,  354 

10.7  Linear  regression  with  dependent  indicator  variable,  357 

10.8  Logistic  response  function,  361 

11.  Multicollinearity,  influential  observations,  and  other 

topics  in  regression  analysis — II,  377 

11.1  Reparameterization  to  improve  computational  accuracy,  377 

11.2  Problems  of  multicollinearity,  382 

11.3  Variance  inflation  factors  and  other  methods  of  detecting 

multicollinearity,  390 

11.4  Ridge  regression  and  other  remedial  measures  for  multicollinearity,  393 

11.5  Identification  of  outlying  observations,  400 

11.6  Identification  of  influential  observations  and  remedial  measures,  407 

12.  Selection  of  independent  variables,  417 

12.1  Nature  of  problem,  417 

12.2  Example,  419 

12.3  All  possible  regression  models,  421 

12.4  Stepwise  regression,  430 

12.5  Selection  of  variables  with  ridge  regression,  436 

12.6  Implementation  of  selection  procedures,  436 

13.  Autocorrelation  in  time  series  data,  444 

13.1  Problems  of  autocorrelation,  445 

13.2  First-order  autoregressive  error  model,  448 

13.3  Durbin-Watson  test  for  autocorrelation,  450 

13.4  Remedial  measures  for  autocorrelation,  454 

14.  Nonlinear  regression,  466 

14.1  Linear,  intrinsically  linear,  and  nonlinear  regression  models,  466 

14.2  Example,  469 

14.3  Least  squares  estimation  in  nonlinear  regression,  470 

14.4  Inferences  about  nonlinear  regression  parameters,  480 

14.5  Learning  curve  example,  483 

15.  Normal  correlation  models,  491 

15.1  Distinction  between  regression  and  correlation  models,  491 

15.2  Bivariate  normal  distribution,  492 


Contents 


15.3  Conditional  inferences,  496 

15.4  Inferences  on  pi2,  501 

15.5  Multivariate  normal  distribution,  505 

Appendix  tables,  515 
SENIC  data  set,  533 
SMSA  data  set,  537 


index,  543 


1 


Some  basic  results  in 
probability  and  statistics 


This  chapter  contains  some  basic  results  in  probability  and  statistics.  It  is 
intended  as  a  reference  chapter  to  which  you  may  refer  as  you  read  this  book. 
Sometimes,  specific  references  to  results  in  this  chapter  are  made  in  the  text.  At 
other  times,  you  may  wish  to  refer  on  your  own  to  particular  results  in  this 
chapter  as  you  feel  the  need. 

You  may  prefer  to  scan  the  results  on  probability  and  statistical  inference  in 
this  chapter  before  reading  Chapter  2,  or  you  may  proceed  directly  to  the  next 
chapter. 


1.1  SUMMATION  AND  PRODUCT  OPERATORS 
Summation  operator 

The  summation  operator  2  is  defined  as  follows: 


(1.1) 


n 


2  Y,  =  7,  +  Y2  +  ■  ■  ■  +  Yn 

i=  1 


1 


2  /  Some  basic  results  in  probability  and  statistics 


Some  important  properties  of  this  operator  are: 

n 

JJJk  =  nk  where  k  is  a  constant 

i=  1 

jj(Yi  +  zi)  =  jjYi  +  jjzi 

i=l  i=  1  /  =  1 

n  n 

^  (a  +  cYj)  =  na  +  c^Yi  where  a  and  c  are 
'=l  i=1  constants 

The  double  summation  operator  22  is  defined  as  follows: 

n  m  n 

(1.3)  +  W 

l=W=l  1=1 

=  Fn  H - 1-  Ylm  +  Y2 i  H - b  Y2m  H - f  F„m 

An  important  property  of  the  double  summation  operator  is: 

n  m  m  n 

(1-4)  2  21<,-  =  2  2IV 

■'-w-i  y-i  i-i 

Product  operator 

The  product  operator  n  is  defined  as  follows: 

(1.5)  flY^YfYi-Yy-Y' 

1  =  1 


(1.2a) 

(1.2b) 

(1.2c) 


1.2  PROBABILITY 

Addition  theorem 

Let  At  and  Aj  be  two  events  defined  on  a  sample  space.  Then: 

(1.6)  P(At  U  Aj)  =  P(Ad  +  P(Aj)  -  P(A,  n  Aj) 

where  P{At  U  AJ)  denotes  the  probability  of  either  A;  or  Aj  or  both  occurring; 
P(AJ)  and  P(AJ)  denote,  respectively,  the  probability  of  A,-  and  the  probability  of 
Aj\  and  P(A{  fl  AJ)  denotes  the  probability  of  both  A,  and  Aj  occurring. 


1.3  Random  variables  /  3 


Multiplication  theorem 

Let  P(Ai\Aj)  denote  the  conditional  probability  of  A,  occurring,  given  that  A; 
has  occurred.  This  conditional  probability  is  defined  as  follows: 

,  P(Af  n  a,) 

(1.7)  P(A'\Aj)  =  /I|  U  P(Aj)*  0 

The  multiplication  theorem  states: 

(1.8)  P(At  D  Aj)  =  PiAdPiAjlAd 

=  PiA^PiAtlAj) 


Complementary  events 

The  complementary  event  of  At  is  denoted  by  A,-.  The  following  results  for 
complementary  events  are  useful: 

(1.9)  P{Ad  =  1  -  P(Af) 

(1.10)  P(Ai  U  Aj)  =  P(Ai  fl  Aj) 

1.3  RANDOM  VARIABLES 

Throughout  this  section,  we  assume  that  the  random  variable  Y  assumes  a 
finite  number  of  outcomes.  (If  Y  is  a  continuous  random  variable,  the  summation 
process  is  replaced  by  integration.) 

Expected  value 

Let  the  random  variable  Y  assume  the  outcomes  Yu ...  ,Yk  with  probabilities 
given  by  the  probability  function: 

(1.11)  f(Ys)=P(Y=Ys)  s=l,...,k 
The  expected  value  of  Y  is  defined: 

(1.12)  E(Y)='2rj(Ya) 

S=  1 

An  important  property  of  the  expectation  operator  E  is: 

(1.13)  E(a  +  cY )  =  a  +  cE(Y)  where  a  and  c  are  constants 
Special  cases  of  this  are: 

(1.13a)  E(a)  =  a 

(1.13b)  E(cY)  =  cE(Y ) 

(1.13c)  E(a  +  Y)  =  a  +  E(Y) 


4  /  Some  basic  results  in  probability  and  statistics 


Variance 

The  variance  of  the  random  variable  Y  is  denoted  by  cr2(Y)  and  is  defined  as 
follows: 

(1.14)  a\Y)  =  E{[Y  -  E(Y)]2} 

An  equivalent  expression  is: 

( 1 . 14a)  a2(Y)  =  E(Y2)  -  [E(Y)]2 

The  variance  of  a  linear  function  of  Y  is  frequently  encountered.  We  denote 
the  variance  of  a  +  cY  by  cr2(a  +  c7)  and  have: 

(1.15)  cr2(a  +  cY )  =  c2cr2(Y )  where  a  and  c  are  constants 

Special  cases  of  this  result  are: 

(1.15a)  a2(a  +  Y)  =  a2(Y) 

(1.15b)  a2(cY)  =  c2a2(Y) 


Joint,  marginal,  and  conditional  probability  distributions 

Let  the  joint  probability  function  for  the  two  random  variables  Y  and  Z  be 
denoted  by  g(T,  Z): 

(1.16)  g(Ys,Zt)=P(Y=YsDZ  =  Zt) 

The  marginal  probability  function  of  Y,  denoted  by  f(Y),  is: 


(1.17a) 


m 


f(Ys)  =  2  z,) 

t=  1 


^  =  1, . . .  ,k 


and  the  marginal  probability  function  of  Z,  denoted  by  h(Z),  is: 

k 

(1.17b)  h(Z,)  =  2  g(Y„  Z,) 

1 

The  conditional  probability  function  of  Y,  given  Z  =  Zt,  is: 

(1.18a)  f(Y,  |Z,)  =  h(Z,)^0-s  =  l,...,k 

h(Zt) 

and  the  conditional  probability  function  of  Z,  given  Y  —  Ys,  is: 


g(Ys,  Zt) 

m ) 


(1.18b) 


h(Zt\Ys) 


f(Ys)  ^  0;  t  =  1, . . .  ,m 


1 .3  Random  variables  /  5 


Covariance 

The  covariance  of  7  and  Z  is  denoted  by  a(Y,  Z )  and  is  defined: 

(1.19)  a(Y,  Z )  =  £{[7  -  £(7)][Z  -  E(Z)]} 

An  equivalent  expression  is: 

(1.19a)  a(Y,  Z )  =  E(YZ)  -  [E{Y)][E(Z) ] 

The  covariance  of  ax  +  c^Y  and  a2  +  c2Z  is  denoted  by  a{ax  +  cxY,  a2  +  c2Z), 
and  we  have: 

(1.20)  a(ax  +  c{Y ,  a2  +  c2Z )  =  C]C2cr(7,  Z)  where  a.\,  a2,  cl5  c2  are 

constants 

Special  cases  of  this  are: 

(1.20a)  a(ClY,  c2Z )  =  clClcr{Y,  Z) 

(1.20b)  o-(fl!  +  Y,  a2  +  Z)  =  cr(7,  Z) 

By  definition,  we  have: 

(1.21)  Cr(y,  K)  =  Or2(Y) 
where  o-2(7)  is  the  variance  of  Y. 

Independent  random  variables 

(1.22)  Random  variables  Y  and  Z  are  independent  if  and  only  if: 

g(7*,  Zf)  =  / (Ys)h(Zt)  s  =  1, . . .  ,k;  t  =  1, . . .  ,m 

If  7  and  Z  are  independent  random  variables: 

(1.23)  cr(7,  Z)  =  0  when  7,  Z  are  independent 

(In  the  special  case  where  7  and  Z  are  jointly  normally  distributed,  cr(7,  Z)  =  0 
implies  that  7  and  Z  are  independent.) 

Functions  of  random  variables 

Let  7| , . . . ,  Yn  be  n  random  variables.  Consider  the  function  E<3,7,  where  the  (3,- 
are  constants.  We  then  have: 

(n  \  n 

^  <3,7,  J  =  ^  a,-£(7,j  where  the  a,-  are  constants 

/=i  /  f=i 

\  n  n 

^  <3,-7,-  ]  =  2)  2  aiaj<J(Yi,  Yj )  where  the  a,-  are 
*=1  '  i=1  1  constants 


6  /  Some  basic  results  in  probability  and  statistics 


Specifically,  we  have  for  n  =  2: 

(1.25a)  Eiatf-L  +  a2Y2)  =  axE(Yx)  +  a2E{Y2) 

(1.25b)  cr2(axYx  +  a2Y2 )  =  a2cr2(Yx )  +  a2a2(Y2)  +  2axa2cr(Yx,  Y2 ) 
If  the  random  variables  Yt  are  independent,  we  have: 


(1.26) 

A  t  a‘Y‘ )  =  2 

when  the  Yt  are 

\i=l  /  i=  1 

independent 

Special 

cases  of  this  are: 

(1.26a) 

»-2(ri  +  ly  =  o-2(F,)  +  o-2(f2> 

when  Y\,  Y2  are 
independent 

(1.26b) 

<T2(F,  -  F2)  =  cr2(F,)  +  <72(F2) 

when  T| ,  F2  are 

independent 

When  the  Yt  are  independent  random  variables,  the  covariance  of  two  linear 
functions  SaT)  and  '2ciYi  is: 


(1.27) 


o'!  2  a‘Y>’  2  C?i 


\*  =  1 


i  =  l 


n 

^  a,c,cr2(}//)  when  the  Yt  are 
/=1  independent 


Central  limit  theorem 

(1.28)  If  Yx, . . . ,  Yn  are  independent  random  observations  from  a  popula¬ 
tion  with  probability  function  f(Y)  for  which  a2(Y)  is  finite,  the 
sample  mean  Y: 

n 

—  /=  1 

Y  = - 

n 

is  approximately  normally  distributed  when  the  sample  size  n  is 
reasonably  large,  with  mean  E(Y)  and  variance  a2(Y)/n. 


1.4  NORMAL  PROBABILITY  DISTRIBUTION  AND 
RELATED  DISTRIBUTIONS 


Normal  probability  distribution 

The  density  function  for  a  normal  random  variable  Y  is: 


(1.29)  f(Y)  = 


1 


\Z1tt 


rra 


exp 


1  lY—fx^2 

2  \  cr 


-00  <  Y  <  +00 


1 .4  Normal  probability  distribution  and  related  distributions 


/  7 


where  (jl  and  cr  are  the  two  parameters  of  the  normal  distribution  and  exp(a) 
denotes  ea. 

The  mean  and  variance  of  a  normal  random  variable  Y  are: 

(1.30a)  E(Y)  =  il 

(1.30b)  ct2(F)  =  o-2 

Function  of  normal  random  variable.  A  linear  function  of  a  normal  ran¬ 
dom  variable  Y  has  the  following  property: 

(1.31)  If  Y  is  a  normal  random  variable,  the  transformed  variable 
Y'  =  a  +  cY  (a  and  c  are  constants)  is  normally  distributed,  with 
mean  a  +  cE(Y )  and  variance  c2cr2(Y). 

Standard  normal  variable.  The  standard  normal  variable  z: 

Y  —  fJL 

(1.32)  z  = -  where  Y  is  a  normal  random  variable 

cr 

is  normally  distributed,  with  mean  0  and  variance  1.  We  denote  this  as  follows: 

(1.33)  z  is  N( 0,  1) 


Mean  Variance 

Table  A-l  in  the  Appendix  contains  the  cumulative  probabilities  A  for  percen¬ 
tiles  z(A)  where: 

(1.34)  P{z  <  z(A)}  =  A 

For  instance,  when  z(A)  =  2.00,  A  =  .9772.  Because  the  normal  distribution  is 
symmetrical  about  0,  when  z{A)  =  —2.00,  A  =  1  —  .9772  =  .0228. 

Function  of  independent  normal  random  variables.  Let  Y\,...,Yn  be 
independent  normal  random  variables.  We  then  have: 

(1.35)  When  Y\, . . . ,  Yn  are  independent  normal  random  variables,  the  lin¬ 
ear  combination  a{Y\  +  a2Y2  +  •  •  •  +  anYn  is  normally  distributed, 
with  mean  XayElL)  and  variance  'Zafcr2(Yi). 


X2  distribution 

Let  zi, . . .  ,zv  be  v  independent  standard  normal  variables.  We  then  define: 

(1.36)  X2(v)  =  zj  +  z2  -i - 1-  z2  where  the  z;-  are 

independent 

The  x2  distribution  has  one  parameter,  v,  which  is  called  the  degrees  of  freedom 
(df).  The  mean  of  the  x2  distribution  with  v  degrees  of  freedom  is: 

(1.37)  E[X2(v)]  =  v 


8  /  Some  basic  results  in  probability  and  statistics 


Table  A-3  in  the  Appendix  contains  percentiles  of  various  x2  distributions. 
We  define  ^2(A;  v )  as  follows: 

(1.38)  P{X2{v)  <  *2(A;  v)}  =  A 

Suppose  v  =  5 .  The  90th  percentile  of  the  x2  distribution  with  5  degrees  of 
freedom  is  ^2(.90;  5)  =  9.24. 


t  distribution 

Let  z  and  x2(v)  be  independent  random  variables  (standard  normal  and  x2> 
respectively).  We  then  define: 

z  7 

(1.39)  t(v)  =-j= — - tt—  where  z  and  x  (v)  are  independent 

X  (p) 

v 


The  t  distribution  has  one  parameter,  the  degrees  of  freedom  v.  The  mean  of  the 
t  distribution  with  v  degrees  of  freedom  is: 

(1.40)  E[t(v)]  =  0 

Table  A-2  in  the  Appendix  contains  percentiles  of  various  t  distributions.  We 
define  t(A:  v)  as  follows: 

(1.41)  P{t(v)  <  t(A;  v)}  =  A 

Suppose  v  =  10.  The  90th  percentile  of  the  t  distribution  with  10  degrees  of 
freedom  is  t(. 90;  10)  =  1 .372.  Because  the  t  distribution  is  symmetrical  about  0, 
we  have  t(.10;  10)  =  -1.372. 


F  distribution 

Let  x2(vi)  and  x2(v2)  be  two  independent  x2  random  variables.  We  then 
define: 


(1.42) 


F(vx,  v2) 


x2M  ^  x2M 

V]  v2 


where  x2(pi)  and  X2( vi) 
are  independent 


Numerator  Denominator 
df  df 

The  F  distribution  has  two  parameters,  the  numerator  degrees  of  freedom  and  the 
denominator  degrees  of  freedom,  here  Vi  and  v2,  respectively. 

Table  A-4  in  the  Appendix  contains  percentiles  of  various  F  distributions.  We 
define  F(A;  v\,  v2)  as  follows: 

(1.43)  P{F(vu  v2)  <  F(A;  vu  v2)}  =  A 

Suppose  vx  =  2,  v2  =  3.  The  90th  percentile  of  the  F  distribution  with  2  and  3 
degrees  of  freedom,  respectively,  in  the  numerator  and  denominator  is 
F(.90;2,  3)  =  5.46. 


1 .5  Statistical  estimation  /  9 


1.5 


Percentiles  below  50  percent  can  be  obtained  by  utilizing  the  relation: 


1 


(1.44) 

F(A;  vu  v2)  =  . 

F(1  -  A;  v2,  vx) 

Thus,  F(.10;  3,  2)  =  1/F(.90;  2,  3)  =  1/5.46  =  .183. 

The  following  relation  exists  between  the  t  and  F  random  variables: 

(1.45a) 

77 

K 

II 

(N 

17 

and  the  percentiles  of  the  t  and  F  distributions  are  related  as  follows: 

(1.45b) 

[<■(.5  +  A/2;  v)]2  =  F(A;  1,  j.) 

STATISTICAL  ESTIMATION 

Properties  of  estimators 

(1.46) 

A 

An  estimator  9  of  the  parameter  9  is  unbiased  if: 

E{9)  =  9 

(1.47) 

An  estimator  9  is  a  consistent  estimator  of  9  if: 

lim  P(|  9  —  9\  e)  =  0  for  any  e  >  0 

tl  >co 

(1.48) 

A 

An  estimator  9  is  a  sufficient  estimator  of  9  if  the  conditional  joint 

A 

probability  function  of  the  sample  observations,  given  9,  does  not 
depend  on  the  parameter  9. 

(1-49) 

An  estimator  9  is  a  minimum  variance  estimator  of  9  if  for  any 
other  estimator  9*: 

cr2{9)  <  cr2(0*)  for  all  0* 


Maximum  likelihood  estimators 

The  method  of  maximum  likelihood  is  a  general  method  of  finding  estimators. 
Suppose  we  are  sampling  a  population  whose  probability  function  f(Y;  9)  in¬ 
volves  one  parameter,  9.  Given  independent  observations  Yx, ...  ,Yn,  the  joint 
probability  function  of  the  sample  observations  is: 

(1.50a)  = 

i=  1 

When  this  joint  probability  function  is  viewed  as  a  function  of  9,  with  the  obser¬ 
vations  given,  it  is  called  the  likelihood  function  L{9). 

m  =  fim,  e) 

i=  1 


(1.50b) 


10 


/ 


Some  basic  results  in  probability  and  statistics 


Maximizing  L{8)  with  respect  to  8  yields  the  maximum  likelihood  estimator  of  8. 
Under  quite  general  conditions,  maximum  likelihood  estimators  are  consistent 
and  sufficient. 


Least  squares  estimators 

The  method  of  least  squares  is  another  general  method  of  finding  estimators. 
The  sample  observations  are  assumed  to  be  of  the  form  (for  the  case  of  a  single 
parameter  8): 

(1.51)  Yi  =  fi(8)  +  £/  /  =  l, ...  ,n 

wher efi(8)  is  a  known  function  of  the  parameter  8  and  the  et  are  random  varia¬ 
bles,  usually  assumed  to  have  expectation  £'(e;)  =  0. 

With  the  method  of  least  squares,  for  the  given  sample  observations,  the  sum 
of  squares: 

(1.52)  G  =  i  [Y, -f,m2 

i=  1 

is  considered  as  a  function  of  8.  The  least  squares  estimator  of  8  is  obtained  by 
minimizing  Q  with  respect  to  8.  In  many  instances,  least  squares  estimators  are 
unbiased  and  consistent. 

1.6  INFERENCES  ABOUT  POPULATION  MEAN— NORMAL 
POPULATION 

We  have  a  random  sample  of  n  observations  Y\,...,Yn  from  a  normal  popula¬ 
tion  with  mean  (jl  and  standard  deviation  cr.  The  sample  mean  and  sample  stand¬ 
ard  deviation  are: 


2* 


(1.53a) 


(1.53b) 


Y  = 


n 


s  = 


2  (Y,  -  Yf 


1/2 


n  —  1 

and  the  estimated  standard  deviation  of  the  sampling  distribution  of  Y  is: 

s 


(1.53c) 

We  then  have: 
(1.54) 


s(X) 


Vn 


Y  —  fx 

— ~~~  is  distributed  as  t  with  n  —  1  degrees  of  freedom  when  the 
s(Y) 


random  sample  is  from  a  normal  population. 


1 .6  Inferences  about  population  mean — normal  population  /  1 1 


Interval  estimation 

The  confidence  limits  for  fx  with  a  confidence  coefficient  of  1  —  a  are  ob¬ 
tained  by  means  of  (1.54): 

(1.55)  Y±  f(l  -  a/2;  n  -  l)j(F) 

Example  1.  Obtain  a  95  percent  confidence  interval  for  /x  when: 

n=  10  Y  =  20  5  =  4 

We  require: 

4 

j(y)  =^=-=  1.265  t(. 975;  9)  =  2.262 

so  that  the  confidence  limits  are  20  ±  2.262(1.265).  Hence,  the  95  percent  con¬ 
fidence  interval  for  fx  is: 

17.1  <  At  <22.9 


Tests 


One-sided  and  two-sided  tests  concerning  the  population  mean  fx  are  con¬ 
structed  by  means  of  (1.54),  based  on  the  test  statistic: 


(1.56) 


t*  = 


Y-jx  o 

5(E) 


Table  1.1  contains  the  decision  rules  for  each  of  three  possible  cases,  with  the 
risk  of  making  a  Type  I  error  controlled  at  a. 


TABLE  1.1  Decision  rules  for  tests  concerning  mean  /x  of  normal 
population 


Alternatives  Decision  Rule 

(a) 

Hq\  fx  =  jio  If  1 1*  |  ^  t(l  —  a/2;  n  —  1),  conclude  H0 

Ha :  fx  7^  fJLo  If  1 1*  |  >  f(l  —  a/2;  n  —  1),  conclude  Ha 

where: 

t*=*=J* 

s(Y) 

(b) 

H0:  [jl  —  fJLo  If  r*  >  r(a;  n  —  1),  conclude  H0 

Ha\  jx  <  /x0  If  t*  <  r(a;  n  —  1),  conclude  Ha 

(c) 

If  r*  <  r(l  —  a;  n  —  1),  conclude  H0 
If  t*  >  f(l  —  a;  n  —  1),  conclude  Ha 


H0:  fx<  ixo 

Ha :  ix  >  fxo 


12  /  Some  basic  results  in  probability  and  statistics 


Example  2.  Choose  between  the  alternatives: 

H0:  ix  <  20 
Ha.  /x>  20 

when  a  is  to  be  controlled  at  .05  and: 

n  =  15  F  =  24  5  =  6 

We  require: 

_  6 

5(F)  =  — =  =  1.549 

Vl5 

t(.  95;  14)  =  1.761 

so  that  the  decision  rule  is: 

If  t*  <  1.761,  conclude  H0 
If  t*  >  1.761,  conclude  Ha 

Since  t*  =  (24  —  20)/1.549  =  2.58  >  1.761,  we  conclude  Ha. 


Example  3.  Choose  between  the  alternatives: 

H0  :  ix  =  10 
Ha :  \x  ¥>  10 

when  a  is  to  be  controlled  at  .02  and: 

n  =  25  F  =  5.7  5  =  8 

We  require: 

S(F)  =  W=L6 

r(.99;  24)  =  2.492 

so  that  the  decision  rule  is: 

If  1 <  2.492,  conclude  H0 
If  |t*|  >  2.492,  conclude  Ha 

where  the  symbol  |  |  stands  for  the  absolute  value.  Since  1 1*  |  =  |  (5.7  —  10)/1.6| 
=  |  —2.69 1  =  2.69  >  2.492,  we  conclude  Ha. 

P-value  for  sample  outcome.  The  P-value  for  a  sample  outcome  is  the 
probability  that  the  sample  outcome  could  have  been  more  extreme  than  the 
observed  one  when  fx  =  fx0.  Large  P- values  support  H0  while  small  P- values 
support  Ha.  A  test  can  be  carried  out  by  comparing  the  P- value  with  the  specified 
a  risk.  If  the  P-value  equals  or  is  greater  than  the  specified  a,  H0  is  concluded.  If 
the  P-value  is  less  than  a,  Ha  is  concluded. 


1.7  Comparisons  of  two  population  means — normal  populations  /  13 


Example  4.  In  Example  2,  t*  =  2.58.  The  P-value  for  this  sample  outcome 
is  the  probability  P[t(  14)  >  2.58].  From  Table  A-2,  we  find  t(.985;  14)  = 
2.415  and  r(.990;  14)  =  2.624.  Hence,  theP-value  is  between  .010  and  .015.  In 
fact,  it  can  be  shown  to  be  .011.  Thus,  for  a  =  .05,  Ha  is  concluded. 

Example  5.  In  Example  3,  t*  =  -2.69.  We  find  from  Table  A-2  that 
P[t( 24)  <  —2.69]  is  between  .005  and  .0075.  In  fact,  it  can  be  shown  to  be 
.0064.  Because  the  test  is  two-sided  and  the  t  distribution  is  symmetrical,  the 
two-sided  P- value  is  twice  the  one-sided  value,  or  2(.0064)  =  .013.  Hence,  for 
a  =  .02,  we  conclude  Ha. 

Relation  between  tests  and  confidence  intervals.  There  is  a  direct  relation 
between  tests  and  confidence  intervals.  For  example,  the  two-sided  confidence 
limits  (1.55)  can  be  used  for  testing: 

Hq.  fX  =  1*Lq 
Ha :  ^  /A> 

If  f-LQ  is  contained  within  the  1  —  a  confidence  interval,  then  the  two-sided  deci¬ 
sion  rule  in  Table  1.1a,  with  level  of  significance  a,  will  lead  to  conclusion  H0, 
and  vice  versa.  If  /jl0  is  not  contained  within  the  confidence  interval,  the  decision 
rule  will  lead  to  Ha,  and  vice  versa. 

There  are  similar  correspondences  between  one-sided  confidence  intervals 
and  one-sided  decision  rules. 

1.7  COMPARISONS  OF  TWO  POPULATION 
MEANS— NORMAL  POPULATIONS 

Independent  samples 

There  are  two  normal  populations,  with  means  ixx  and  /x2,  respectively,  and 
with  the  same  standard  deviation  or.  The  means  fx1  and  /x2  are  to  be  compared  on 
the  basis  of  independent  samples  for  each  of  the  two  populations: 

Sample  1:  Yu  . . .  ,Yni 

Sample  2:  Zx, . . .  ,Z„2 

Estimators  of  the  two  population  means  are  the  sample  means: 

Si'- 

(1.57a)  Y  =  — - 

«1 

?Zi 

(1.57b)  Z  =  — - 

n2 

and  an  estimator  of  yu-i  —  (jl2  is  Y  —  Z. 


14  /  Some  basic  results  in  probability  and  statistics 


An  estimator  of  the  common  variance  cr2  is: 


(1.58) 


2  (Yt  —  Y  f  +  2  (2,-  -  Z)2 

i  i 


ni  +  n2  —  2 


and  an  estimator  of  cr2(Y  —  Z ) ,  the  variance  of  the  sampling  distribution  of 
Y-Z,  is: 


(1.59) 


s2(Y  -Z)  =  s2 


—  +  — 
«i  n2 


We  have: 


(Y-Z)  -  (fi!  -  fi2) 

(1.60)  - ~P — -=■ - is  distributed  as  t  with  nx  +  n2  -  2  degrees 

of  freedom  when  the  two  independent  samples  come  from  normal 
populations  with  the  same  standard  deviation. 


Interval  estimation.  The  confidence  limits  for  /xi  —  /jl2  with  confidence 
coefficient  1  —  a  are  obtained  by  means  of  (1.60): 

(1.61)  (F  -  Z)  ±  r(l  -  a/2;  nx  +  n2  -  2 )s(Y  -  Z) 


Example  6.  Obtain  a  95  percent  confidence  interval  for  Hi  —  fx2  when: 

nx  =  10  Y  =  14  2(F  -  E)2  =  105 
n2  =  20  Z  =  8  2(Zf  -  Z)2  =  224 


We  require: 


^2(F  -  Z) 

s(Y  ~  Z) 
t(. 975;  28) 


105  +  224 
10  +  20-2 
/  1 

11.75  - + 

\  10 


=  11.75 


20 


1.7625 


1.328 

2.048 


3.3  =  (14  -  8)  -  2.048(1.328)  <  /xx  -  /x2  <  (14  -  8)  +  2.048(1.328)  =  8.7 


Tests.  One-sided  and  two-sided  tests  concerning  (jlx  —  fx2  are  constructed  by 
means  of  (1.60).  Table  1.2  contains  the  decision  rules  for  each  of  three  possible 
cases,  based  on  the  test  statistic: 


(1.62) 


t* 


Y-Z 
s(Y  -  Z) 


with  the  risk  of  making  a  Type  I  error  controlled  at  a. 


1.7  Comparisons  of  two  population  means — normal  populations  /  15 


TABLE  1.2  Decision  rules  for  tests  concerning  means  ^  and  /x2  of  two 
normal  populations  (cr1  =  a2  =  a) 


Alternatives 


Decision  Rule 


H0 :  jxx  =  jx 2 
Ha'-  fXi  Z1  [X2 


H0:  jxx  >  fx2 
Hd-  <  fx 2 

Ho-  jxx  —  fx 2 

Ha'-  !X\  >  fX2 


(a) 

If  1 1*  |  <  t(  1  —  a/2;  «i  +  «2  —  2),  conclude  7/0 
If  1 1*  |  >  f(l  —  a/2;  +  «2  —  2),  conclude  //a 


where: 


t* 


Y-Z 
s(Y  -  Z) 


(b) 

If  t*  s  /(a;  +  «2  “  2),  conclude  H0 
If  t*  <  f(a;  n\  +  n2  —  2),  conclude  Ha 


(c) 

If  r*  <  r(l  —  a;  «|  +  n2  —  2),  conclude  H0 
If  r*  >  f(l  —  a;  +  n2  —  2),  conclude  Ha 


Example  7.  Choose  between  the  alternatives: 

H0:  /X!  =  /z2 

/^l  ^  1^2 

when  a  is  to  be  controlled  at .  10  and  the  data  are  those  of  Example  6.  We  require 
r(.95;  28)  =  1.701,  so  that  the  decision  rule  is: 

If  |f*|  <  1.701,  conclude  H0 
If  \t*\  >  1.701,  conclude  Ha 

Since  |r*|  =  |  (14  —  8)/l . 328 1  =  |4.52|  =  4.52  >  1.701,  we  conclude  Ha. 

The  one-sided  P-value  here  is  the  probability  P[t( 28)  >  4.52].  We  see  from 
Table  A-2  that  this  P-value  is  less  than  .0005.  In  fact,  it  can  be  shown  to  be 
.00005.  Hence,  the  two-sided  P-value  is  .0001.  For  a  =  .10,  the  appropriate 
conclusion  therefore  is  Ha. 


Paired  observations 

When  the  observations  in  the  two  samples  are  paired  (e.g.,  attitude  scores  Y, 
and  Zt  for  the  ith  sample  employee  before  and  after  a  year’s  experience  on  the 
job),  we  use  the  differences: 

(1.63)  wi  =  Yi-Zi 

in  the  fashion  of  a  sample  from  a  single  population.  Thus,  when  the  Wf  can  be 
treated  as  observations  from  a  normal  population,  we  have: 


16  /  Some  basic  results  in  probability  and  statistics 


(1.64) 


W  ~  (/xi  ~  ix2) 

s(W) 


is  distributed  as  t  with  n  —  1  degrees  of  freedom 


when  the  differences  Wt  can  be  considered  to  be  observations  from 
a  normal  population  and: 


w  =  — - 

n 

2  w  -  wo2 

j 2(W)  =  — 1 - -f-  n 

n  —  1 


1.8  INFERENCES  ABOUT  POPULATION 
VARIANCE— NORMAL  POPULATION 


When  sampling  from  a  normal  population,  the  following  holds  for  the  sample 
variance  s2  where  s  is  defined  in  (1.53b): 


(1.65) 


(n  —  l)s2 


(T 


is  distributed  as  x2  with  n  —  l  degrees  of  freedom  when 


the  random  sample  is  from  a  normal  population. 


Interval  estimation 


The  lower  confidence  limit  L  and  the  upper  confidence  limit  U  in  a  confidence 
interval  for  the  population  variance  cr2  with  confidence  coefficient  1  —  a  are 
obtained  by  means  of  (1.65): 


(1.66) 


L  = 


( n  —  1)^ 


*2(1  -  a/2;  n  -  1) 


U  = 


(n  —  l)^2 


*2(o/2;  n  -  1) 


Example  8.  Obtain  a  98  percent  confidence  interval  for  cr2,  using  the  data  of 
Example  1  (n  =  10,  s  =  4). 

We  require: 

s2  =  16  x2(-01;  9)  =  2.09  x2(.99;  9)  =  21.67 


6.6  = 


9(16) 

21.67 


cr 


9(16) 

2.09 


68.9 


Tests 

One-sided  and  two-sided  tests  concerning  the  population  variance  cr2  are  con¬ 
structed  by  means  of  (1.65).  Table  1.3  contains  the  decision  rule  for  each  of  three 
possible  cases,  with  the  risk  of  making  a  Type  I  error  controlled  at  a. 


1 .9  Comparisons  of  two  population  variances — normal  population^  /  1 7 


TABLE  1.3  Decision  rules  for  tests  concerning  variance  cr2  of  normal 
population 


Alternatives 


Decision  Rule 


H0 :  a2  =  al 
Ha :  o-2  7^  al 

Hq :  a2  >  al 
Ha:  a2  <  al 

H0 :  a2  <  al 
Ha\  a2  >  al 


(a) 

(ft  —  I'jv2 

If  *2(a:/2;  n  -  I)  < - j - -  X2(l  ~  a/2;  n  -  1), 

cr  o 

conclude  //0 
Otherwise  conclude 


If 

If 


(b) 


(n  —  l),?2 

oo 

(n  —  l),?2 
Oq 


>  ^2(a;  n  —  1),  conclude  H0 
<  x2(a',  «  —  1),  conclude  Ha 


(c) 

fj1?  —  l)*2 

If - 5 - s  ^2(1  —  a\  n  —  I),  conclude  7/0 

°o 

(«  —  l)s2  - 

If - 2 - >  ~  ol\  n  —  I),  conclude  Ha 

CTq 


1.9  COMPARISONS  OF  TWO  POPULATION 
VARIANCES— NORMAL  POPULATIONS 


Independent  samples  are  selected  from  two  normal  populations,  with  means 
and  variances  of  and  a\  and  /jl2  and  a2,  respectively.  Using  the  notation  of 
Section  1.7,  the  two  sample  variances  are: 


(1.67a) 


(1.67b) 

We  have: 


2  (h  -  yf 

i 


2  -  z>2 

i 


S  i  S  2 

(1.68)  — 5-  - 7  is  distributed  as  F(nx  -  1,  n2  ~  1)  when  the  two  inde- 

<r  1  02 

pendent  samples  come  from  normal  populations. 


Interval  estimation 

The  lower  and  upper  confidence  limits  L  and  U  for  or\lcr2  with  confidence 
coefficient  1  —  a  are  obtained  by  means  of  (1.68): 


18  /  Some  basic  results  in  probability  and  statistics 


s\  F(  1  —  a/2;  «i  ~  1,  «2  —  1) 

(1.69) 

U  =  — - 

s\  F(al 2;  nx  —  1,  n2  ~  1) 

Example  9.  Obtain  a  90  percent  confidence  interval  for  crf/crl  when  the  data 
are: 

Hi  =  16  «2  =  21 

^  =  54.2  si  =  17.8 

We  require: 

F(.05;  15,  20)  =  1/F(.95;  20,  15)  =  1/2.33  =  .429 
F(.95;  15,  20)  =  2.20 

54.2  1  af  54.2  1 

1.4  = - <  — ^  < - =  7.1 

17.8  2.20  arl  17.8  .429 


Tests 

One-sided  and  two-sided  tests  concerning  a \lcrl  are  constructed  by  means  of 
(1.68).  Table  1 .4  contains  the  decision  rules  for  each  of  three  possible  cases,  with 
the  risk  of  making  a  Type  I  error  controlled  at  a. 


TABLE  1.4  Decision  rules  for  tests  concerning  variances  o-f  and  o-f  of  two 
normal  populations 


Alternatives 

Decision  Rule 

(a) 

Hq.  erf  =  a\ 

If  F(al2-  m  ~  1,  n2  -  1)  < -L 

$2 

^  F(  1  —  all ;  ni  —  1,  n2  —  1),  conclude  H0 

Ha :  oi  7^  a\ 

Otherwise  conclude  Ha 

(b) 

Hq.  erf  >  &2 

s2 

If  —  a  F(a;  «i  —  1,  «2  -  1),  conclude  H0 
$2 
o 

Ha.  o\  <  a\ 

S^“ 

If  — <  F(a;  /i,  —  1,  —  1),  conclude  Ha 

4 

(C) 

o 

Hq.  erf  <  erf 

S^“ 

If  —5-  <  F(1  —  a;  n,  —  1,  n2  —  1),  conclude  H0 

4 

Ha:  erf  >  erf 

s2 

If  -4-  >  F(1  —  a;  ni  —  l,  n2  —  conclude  Ha 

4 

1.9  Comparisons  of  two  population  variances — normal  populations  /  19 


Example  10.  Choose  between  the  alternatives: 

H0  :  or  I  =  or  I 
Ha :  a\  ^  cr\ 

when  a  is  to  be  controlled  at  .02  and  the  data  are  those  of  Example  9. 
We  require: 

F(.01;  15,  20)  =  1/F(.99;  20,  15)  =  1/3.37  =  .297 
F(.99;  15,  20)  =  3.09 

so  that  the  decision  rule  is: 

iy2 

If  .297  <  —7T  ^  3.09,  conclude  H0 

4 

Otherwise  conclude  Ha 

Since  sj/s2  =  54.2/17.8  =  3.04,  we  conclude  H0. 


PART  I 


Basic  regression  analysis 


Linear  regression  with 
one  independent  variable 


Regression  analysis  is  a  statistical  tool  that  utilizes  the  relation  between  two  or 
more  quantitative  variables  so  that  one  variable  can  be  predicted  from  the  other, 
or  others.  For  example,  if  one  knows  the  relation  between  advertising  expendi¬ 
tures  and  sales,  one  can  predict  sales  by  regression  analysis  once  the  level  of 
advertising  expenditures  has  been  set. 

In  Part  I  of  this  book,  we  take  up  regression  analysis  when  a  single  predictor 
variable  is  used  for  predicting  the  variable  of  interest.  In  this  chapter  specifically, 
we  consider  the  basic  ideas  of  regression  analysis  and  discuss  the  estimation  of 
the  parameters  of  the  regression  model. 

2.1  RELATIONS  BETWEEN  VARIABLES 

The  concept  of  a  relation  between  two  variables,  such  as  between  family 
income  and  family  expenditures  for  housing,  is  a  familiar  one.  We  distinguish 
between  a  functional  relation  and  a  statistical  relation,  and  consider  each  of  these 
in  turn. 


23 


24  /  Linear  regression  with  one  independent  variable 


Functional  relation  between  two  variables 

A  functional  relation  between  two  variables  is  expressed  by  a  mathematical 
formula.  If  X  is  the  independent  variable  and  Y  the  dependent  variable,  a  func¬ 
tional  relation  is  of  the  form: 


Y=f(X ) 

Given  a  particular  value  of  X,  the  function  /  indicates  the  corresponding  value 
of  Y. 

Example.  Consider  the  relation  between  dollar  sales  (Y)  of  a  product  sold  at 
a  fixed  price  and  number  of  units  sold  (X).  If  the  selling  price  is  $2  per  unit,  the 
relation  is  expressed  by  the  equation: 

Y=  2X 

This  functional  relation  is  shown  in  Figure  2.1.  Number  of  units  sold  and  dollar 
sales  during  three  recent  periods  (while  the  unit  price  remained  constant  at  $2) 
were  as  follows: 


Period 

Number  of 
Units  Sold 

Dollar  Sales 

1 

75 

$150 

2 

25 

50 

3 

130 

260 

These  observations  are  plotted  also  in  Figure  2.1.  Note  that  all  fall  directly  on  the 
line  of  functional  relationship.  This  is  characteristic  of  all  functional  relations. 

FIGURE  2.1  Example  of  functional  relation 

Dollar  Sales 


0 


50 


2.1  Relations  between  variables  /  25 


Statistical  relation  between  two  variables 

A  statistical  relation,  unlike  a  functional  relation,  is  not  a  perfect  one.  In 
general,  the  observations  for  a  statistical  relation  do  not  fall  directly  on  the  curve 
of  relationship. 


Example  1.  A  certain  spare  part  is  manufactured  by  the  Westwood  Company 
once  a  month  in  lots  which  vary  in  size  as  demand  fluctuates.  Table  2.1,  page  36, 
contains  data  on  lot  size  and  number  of  man-hours  of  labor  for  10  recent  produc¬ 
tion  runs  performed  under  similar  production  conditions.  These  data  are  plotted 
in  Figure  2.2a.  Man-hours  are  taken  as  the  dependent  or  response  variable  Y,  and 
lot  size  as  the  independent  or  predictor  variable  X.  The  plotting  is  done  as  before. 
For  instance,  the  first  production  run  results  are  plotted  as  X  =  30,  Y  =  73. 


FIGURE  2.2  Statistical  relation  between  lot  size  and  number  of  man-hours — Westwood 
Company  example 


Lot  Size 


Lot  Size 


Figure  2.2a  clearly  suggests  that  there  is  a  relation  between  lot  size  and  num¬ 
ber  of  man-hours,  in  the  sense  that  the  larger  the  lot  size,  the  greater  tends  to  be 
the  number  of  man-hours.  However,  the  relation  is  not  a  perfect  one.  There  is  a 
scattering  of  points,  suggesting  that  some  of  the  variation  in  man-hours  is  not 
accounted  for  by  lot  size.  For  instance,  two  production  runs  (1  and  8)  consisted 
of  30  parts,  yet  they  required  somewhat  different  numbers  of  man-hours.  Be¬ 
cause  of  the  scattering  of  points  in  a  statistical  relation,  Figure  2.2a  is  called  a 
scatter  diagram  or  scatter  plot.  In  statistical  terminology,  each  point  in  the 
scatter  diagram  represents  an  observation  or  trial. 

In  Figure  2.2b,  we  have  plotted  a  line  of  relationship  which  describes  the 


26  /  Linear  regression  with  one  independent  variable 


statistical  relation  between  man-hours  and  lot  size.  It  indicates  the  general  tend¬ 
ency  by  which  man-hours  vary  with  changes  in  lot  size.  Note  that  most  of  the 
points  do  not  fall  directly  on  the  line  of  statistical  relationship.  This  scattering  of 
points  around  the  line  represents  variation  in  man-hours  which  is  not  associated 
with  the  lot  size,  and  which  is  usually  considered  to  be  of  a  random  nature. 
Statistical  relations  can  be  highly  useful,  even  though  they  do  not  have  the 
exactitude  of  a  functional  relation. 

Example  2.  Figure  2.3  presents  data  on  age  and  level  of  a  steroid  in  plasma 
for  17  healthy  females  between  8  and  25  years  old.  The  data  strongly  suggest  that 
the  statistical  relationship  is  curvilinear  (not  linear).  The  curve  of  relationship 
has  also  been  drawn  in  Figure  2.3.  It  implies  that  as  age  becomes  increasingly 
higher,  steroid  level  increases  up  to  a  point  and  then  begins  to  decline.  Note 
again  the  scattering  of  points  around  the  curve  of  statistical  relationship,  typical 
of  all  statistical  relations. 

FIGURE  2.3  Curvilinear  statistical  relation  between  age  and  steroid  level  in  healthy 
females  aged  8  to  25 


Steroid  Level 


2.2  REGRESSION  MODELS  AND  THEIR  USES 
Basic  concepts 

A  regression  model  is  a  formal  means  of  expressing  the  two  essential  ingredi¬ 
ents  of  a  statistical  relation: 

1 .  A  tendency  of  the  dependent  variable  Y  to  vary  with  the  independent  varia¬ 
ble  or  variables  in  a  systematic  fashion. 

2.  A  scattering  of  observations  around  the  curve  of  statistical  relationship. 

These  two  characteristics  are  embodied  in  a  regression  model  by  postulating 
that: 


2.2  Regression  models  and  their  uses  /  27 


1 .  In  the  population  of  observations  associated  with  the  sampled  process,  there 
is  a  probability  distribution  of  Y  for  each  level  of  X. 

2.  The  means  of  these  probability  distributions  vary  in  some  systematic  fashion 
with  X. 

Example.  Consider  again  the  Westwood  Company  lot  size  example.  The 
number  of  man-hours  Y  is  treated  in  a  regression  model  as  a  random  variable.  For 
each  lot  size,  there  is  postulated  a  probability  distribution  of  Y.  Figure  2.4  shows 
such  a  probability  distribution  for  X  =  30,  which  is  the  lot  size  for  the  first 
production  run  in  Table  2.1.  The  actual  number  of  man-hours  Y  (73  in  our 
example  in  Table  2.1)  is  then  viewed  as  a  random  selection  from  this  probability 
distribution. 

Figure  2.4  also  shows  probability  distributions  of  Y  for  lot  sizes  X  =  50  and 
X  =  70.  Note  that  the  means  of  the  probability  distributions  have  a  systematic 
relation  to  the  level  of  X.  This  systematic  relationship  is  called  the  regression 
function  of  Y  on  X.  The  graph  of  the  regression  function  is  called  the  regression 
curve.  Note  that  in  Figure  2.4  the  regression  function  is  linear.  This  would  imply 
for  our  example  that  the  expected  (mean)  number  of  man-hours  varies  linearly 
with  lot  size. 

There  is  of  course  no  a  priori  reason  why  man-hours  need  be  linearly  related 
to  lot  size.  Figure  2.5  shows  another  possible  regression  model  for  our  example. 


FIGURE  2.4  Pictorial  representation  of  linear  regression  model 
Lot  Size 


28  /  Linear  regression  with  one  independent  variable 


Here  the  regression  function  is  curvilinear,  with  a  shape  reflecting  economies  of 
scale  with  larger  lot  sizes.  Figure  2.5  differs  in  orientation  from  Figure  2.4  in  that 
the  X  and  Y  axes  are  plotted  conventionally  in  Figure  2.5.  While  this  makes  it  not 
quite  as  easy  to  view  the  probability  distributions,  the  orientation  of  Figure  2.5 
shows  the  regression  curve  in  the  perspective  to  be  utilized  from  here  on. 

FIGURE  2.5  Pictorial  representation  of  curvilinear  regression  model 


Man-Hours 


Regression  models  may  differ  in  the  form  of  the  regression  function  as  in 
Figures  2.4  and  2.5,  in  the  shape  of  the  probability  distributions  of  the  F’s,  and 
in  still  other  ways.  Whatever  the  variation,  the  concept  of  a  probability  distribu¬ 
tion  of  Y  for  given  X  is  the  formal  counterpart  to  the  empirical  scatter  in  a 
statistical  relation.  Similarly,  the  regression  curve,  which  describes  the  relation 
between  the  means  of  the  probability  distributions  and  X,  is  the  counterpart  to  the 
general  tendency  of  Y  to  vary  with  X  systematically  in  a  statistical  relation. 

Note 

The  expressions  “independent  variable”  or  “predictor  variable”  for  X  and  “depend¬ 
ent  variable”  or  “response  variable”  for  Fin  a  regression  model  simply  are  conventional 
labels.  There  is  no  implication  that  Y  causally  depends  on  X  in  a  given  case.  No  matter 
how  strong  the  statistical  relation,  no  cause-and-effect  pattern  is  necessarily  implied  by 
the  regression  model.  In  some  applications,  an  independent  variable  actually  is  dependent 
causally  on  the  response  variable,  as  when  we  estimate  temperature  (the  response)  from 
the  height  of  mercury  (the  independent  variable)  in  a  thermometer. 

Regression  models  with  more  than  one  independent  variable.  Regression 
models  may  contain  more  than  one  independent  variable. 

1 .  In  an  application  of  regression  analysis  pertaining  to  67  branch  offices  of  a 
consumer  finance  chain,  the  regression  model  contained  direct  operating 
cost  for  the  year  just  ended  as  the  response  variable  and  four  independent 
variables — average  size  of  loan  outstanding  during  the  year,  average  num- 


2.2  Regression  models  and  their  uses  /  29 


ber  of  loans  outstanding,  total  number  of  new  loan  applications  processed, 
and  office  salary  scale  index. 

2.  In  a  tractor  purchase  study,  the  response  variable  was  volume  (in  horse¬ 
power)  of  tractor  purchases  in  each  sales  territory  of  a  farm  equipment  firm. 
There  were  nine  independent  variables,  including  average  age  of  tractors  on 
farms  in  the  territory,  number  of  farms  in  the  territory,  and  a  quantity  index 
of  crop  production  in  the  territory. 

3.  In  a  medical  study  of  short  children,  the  response  variable  was  the  peak 
plasma  growth  hormone  level.  There  were  14  independent  variables,  includ¬ 
ing  age,  sex,  height,  weight,  and  10  skinfold  measurements. 

The  features  represented  in  Figures  2.4  and  2.5  must  be  extended  into  further 
dimensions  when  there  is  more  than  one  independent  variable.  With  two  inde¬ 
pendent  variables  X,  and  X2,  for  instance,  a  probability  distribution  of  Y  for  each 
(Xl5  X2)  combination  is  assumed  by  the  regression  model.  The  systematic  rela¬ 
tion  between  the  means  of  these  probability  distributions  and  the  independent 
variables  X,  and  X2  is  then  given  by  a  regression  surface. 


Construction  of  regression  models 

Selection  of  independent  variables.  Since  reality  must  be  reduced  to  man¬ 
ageable  proportions  whenever  we  construct  models,  only  a  limited  number  of 
independent  or  predictor  variables  can — or  should — be  included  in  a  regression 
model  for  any  situation  of  interest.  A  central  problem  therefore  is  that  of  choos¬ 
ing,  for  a  regression  model,  a  set  of  independent  variables  which  is  “good”  in 
some  sense  for  the  purposes  of  the  analysis.  A  major  consideration  in  making  this 
choice  is  the  extent  to  which  a  chosen  variable  contributes  to  reducing  the  re¬ 
maining  variation  in  Y  after  allowance  is  made  for  the  contributions  of  other 
independent  variables  that  have  tentatively  been  included  in  the  regression 
model.  Other  considerations  include  the  importance  of  the  variable  as  a  causal 
agent  in  the  process  under  analysis;  the  degree  to  which  observations  on  the 
variable  can  be  obtained  more  accurately,  or  quickly,  or  economically  than  on 
competing  variables;  and  the  degree  to  which  the  variable  can  be  preset  by 
management.  In  Chapter  12,  we  shall  discuss  procedures  and  problems  in  choos¬ 
ing  the  independent  variables  to  be  included  in  a  regression  model. 

Functional  form  of  regression  equation.  The  choice  of  the  functional  form 
of  the  regression  equation  is  related  to  the  choice  of  the  independent  variables. 
Sometimes,  relevant  theory  may  indicate  the  appropriate  functional  form.  Learn¬ 
ing  theory,  for  instance,  may  indicate  that  the  regression  function  relating  unit 
production  costs  to  the  number  of  previous  times  the  item  has  been  produced 
should  have  a  specified  shape  with  particular  asymptotic  properties. 

More  frequently,  however,  the  functional  form  of  the  regression  equation  is 
not  known  in  advance  and  must  be  decided  upon  once  the  data  have  been  col¬ 
lected  and  analyzed.  Thus,  linear  or  quadratic  regression  functions  are  often  used 


30  /  Linear  regression  with  one  independent  variable 


as  satisfactory  first  approximations  to  regression  functions  of  unknown  nature. 
Indeed,  these  simple  types  of  regression  functions  may  be  used  even  when  theory 
provides  the  relevant  functional  form,  notably  when  the  known  form  is  highly 
complex  but  can  be  reasonably  approximated  by  a  linear  or  quadratic  regression 
function.  Figure  2.6a  illustrates  a  case  where  a  complex  regression  function  may 
be  reasonably  approximated  by  a  linear  regression  function.  Figure  2.6b  provides 
an  example  where  two  linear  regression  functions  may  be  used  “piecewise”  to 
approximate  a  complex  regression  function. 

FIGURE  2.6  Uses  of  linear  regression  function  to  approximate  complex  regression 
functions 

(a)  (b) 


Scope  of  model.  In  formulating  a  regression  model,  we  usually  need  to 
restrict  the  coverage  of  the  model  to  some  interval  or  region  of  values  of  the 
independent  variable  or  variables.  The  scope  is  determined  either  by  the  design 
of  the  investigation  or  by  the  range  of  data  at  hand.  For  instance,  a  company 
studying  the  effect  of  price  on  sales  volume  investigated  six  price  levels,  ranging 
from  $4.95  to  $6.95.  Here,  the  scope  of  the  model  would  be  limited  to  price 
levels  ranging  from  near  $5  to  near  $7.  The  shape  of  the  regression  function 
would  be  in  serious  doubt  substantially  outside  this  range  because  the  investiga¬ 
tion  provided  no  evidence  as  to  the  nature  of  the  statistical  relation  below  $4.95 
or  above  $6.95. 


Uses  of  regression  analysis 

Regression  analysis  serves  three  major  purposes:  (1)  description,  (2)  control, 
and  (3)  prediction,  as  illustrated  by  the  three  examples  cited  earlier.  The  tractor 
purchase  study  served  a  descriptive  purpose.  In  the  study  of  branch  office  operat¬ 
ing  costs,  the  purpose  was  administrative  control;  management  was  able  by  de- 


2.3  Regression  model  with  distribution  of  error  terms  unspecified  /  31 


veloping  a  usable  statistical  relation  between  costs  and  independent  variables  in 
the  system,  to  set  cost  standards  for  each  branch  office  in  the  company  chain.  In 
the  medical  study  of  short  children,  the  purpose  was  prediction.  Clinicians  were 
able  to  use  the  statistical  relation  to  predict  growth  hormone  deficiencies  in  short 
children  using  simple  measurements  of  the  children. 

The  several  purposes  of  regression  analysis  frequently  overlap  in  practice. 
The  Westwood  Company  lot  size  example  provides  a  case  in  point.  Knowledge 
of  the  relation  between  lot  size  and  man-hours  in  past  production  runs  enables 
management  to  predict  the  man-hour  requirements  for  the  next  production  run  of 
given  lot  size,  for  purposes  of  cost  estimation  and  production  scheduling.  After 
the  run  is  completed,  management  can  compare  the  actual  man-hours  against  the 
predicted  hours  for  purposes  of  administrative  control. 

2.3  REGRESSION  MODEL  WITH  DISTRIBUTION 
OF  ERROR  TERMS  UNSPECIFIED 

Formal  statement  of  model 

In  Part  I  of  this  book,  we  consider  a  basic  regression  model  where  there  is 
only  one  independent  variable  and  the  regression  function  is  linear.  The  model 
can  be  stated  as  follows: 

(2.1)  Yt  =  /30  +  foXi  +  et 

where: 

Y,  is  the  value  of  the  response  variable  in  the  z'th  trial 
/ 30  and  £h  are  parameters 

Xt  is  a  known  constant,  namely,  the  value  of  the  independent  variable  in 
the  z'th  trial 

et  is  a  random  error  term  with  mean  E{ed  =  0  and  variance  cr2(&y)  = 
or2)  Ei  and  Ej  are  uncorrelated  so  that  the  covariance  or(eh  sj)  =  0  for 
all  i,  j;  i  A  j 
i  —  1 , ,n 

Model  (2.1)  is  said  to  be  simple,  linear  in  the  parameters,  and  linear  in  the 
independent  variable.  It  is  “simple”  in  that  there  is  only  one  independent  varia¬ 
ble,  “linear  in  the  parameters”  because  no  parameter  appears  as  an  exponent  or 
is  multiplied  or  divided  by  another  parameter,  and  “linear  in  the  independent 
variable’  ’  because  this  variable  appears  only  in  the  first  power.  A  model  which  is 
linear  in  the  parameters  and  the  independent  variable  is  also  called  a  first-order 
model. 

Important  features  of  model 

1.  The  observed  value  of  Y  in  the  z'th  trial  is  the  sum  of  two  components:  (1) 
the  constant  term  /30  +  Pi Xt  and  (2)  the  random  term  e*.  Hence,  Yt  is  a  random 
variable. 


32  /  Linear  regression  with  one  independent  variable 


2.  Since  E{e^)  =  0,  it  follows  from  (1.13c)  that: 


E(Xd  =  E((B0  +  &&  +  e,)  =  /3o  +  fa*  +  E(e,)  =  yS0  +  0A 

Note  that  0O  +  plays  the  role  of  the  constant  a  in  theorem  (1.13c). 

Thus,  the  response  Yh  when  the  level  of  X  existing  in  the  ith  trial  is  Xn  comes 
from  a  probability  distribution  whose  mean  is: 

(2.2)  E(Yd  =  (30  +  piXi 

We  therefore  know  that  the  regression  function  for  model  (2.1)  is: 

(2.3)  E(Y)  =  fa  +  j8,X 

since  the  regression  function  relates  the  means  of  the  probability  distributions  of 
Y  for  any  given  X  to  the  level  of  X. 

3.  The  observed  value  of  Y  in  the  ith  trial  exceeds  or  falls  short  of  the  value  of 
the  regression  function  by  the  error  term  amount  et. 

4.  The  error  terms  e,  are  assumed  to  have  constant  variance  <x2.  It  therefore 
follows  that  the  variance  of  the  response  Y,  is: 

(2.4)  a2(Yd  =  cr2 
since,  using  theorem  (1.15a),  we  have: 

cr2(p0  +  ft  Xi  +  ed  =  cr2(si)  =  cr 2 

Thus,  model  (2.1)  assumes  that  the  probability  distributions  of  Y  have  the 
same  variance  cr2,  regardless  of  the  level  of  the  independent  variable  X. 

5.  The  error  terms  are  assumed  to  be  uncorrelated.  Hence,  the  outcome  in 
any  one  trial  has  no  effect  on  the  error  term  for  any  other  trial — as  to  whether  it 
is  positive  or  negative,  or  small  or  large.  Since  the  error  terms  e,  and  e7  are 
uncorrelated,  so  are  the  responses  Yf  and  Yj. 

6.  In  summary,  model  (2. 1)  implies  that  the  response  variable  observations  In¬ 
come  from  probability  distributions  whose  means  are  E{Yt)  =  /30  +  fiiXf  and 
whose  variances  are  cr2,  the  same  for  all  levels  of  X.  Further,  any  two  observa¬ 
tions  Yj  and  Yj  are  uncorrelated. 

Example 

Suppose  that  regression  model  (2. 1)  is  applicable  for  the  Westwood  Company 
lot  size  application  and  is  as  follows: 


Yi  =  9.5  +  2AXi  +  Ei 

Figure  2.7  contains  a  presentation  of  the  regression  function: 

E(Y)  =  9.5  +  2.1Z 

Suppose  that  in  the  ith  trial,  a  lot  of  Xt  =  45  units  is  produced  and  the  actual 


2.3  Regression  model  with  distribution  of  error  terms  unspecified 


/ 


33 


number  of  man-hours  is  Yt  =  108.  In  that  case,  the  error  term  value  is  e,  =  +4, 
for  we  have: 


E(Yd  =  9.5  +  2.1(45)  =  104 


and: 


Yi  =  108  =  104  +  4 

Figure  2.7  displays  the  probability  distribution  of  Y  when  X  =  45,  and  indicates 
from  where  in  this  distribution  the  observation  Yt  =  108  came.  Note  again  that 
the  error  term  e,-  is  simply  the  deviation  of  Y{  from  its  mean  value  E(Yt). 

FIGURE  2.7  Illustration  of  linear  regression  model  (2.1) 

Man-Hours 


Figure  2.7  also  shows  the  probability  distribution  of  Y  when  X  =  25.  Note 
that  this  distribution  exhibits  the  same  variability  as  the  probability  distribution 
when  X  =  45,  in  conformance  with  the  requirements  of  model  (2.1). 


Meaning  of  regression  parameters 

The  parameters  (3 0  and  /3f  in  regression  model  (2.1)  are  called  regression 
coefficients,  is  the  slope  of  the  regression  line.  It  indicates  the  change  in  the 
mean  of  the  probability  distribution  of  Y  per  unit  increase  in  X.  The  parameter  j30 
is  the  Y  intercept  of  the  regression  line.  If  the  scope  of  the  model  includes  X  =  0, 
1 30  gives  the  mean  of  the  probability  distribution  of  Y  at  X  =  0.  When  the  scope 
of  the  model  does  not  cover  X  =  0,  (30  does  not  have  any  particular  meaning  as 
a  separate  term  in  the  regression  model. 


34  /  Linear  regression  with  one  independent  variable 


Example.  Figure  2.8  shows  the  regression  function: 

E(Y)  =  9.5  +  2.  IX 

for  the  previous  Westwood  Company  lot  size  example.  The  slope  (3\  =2.1  indi¬ 
cates  that  an  increase  of  one  unit  in  lot  size  leads  to  an  increase  in  the  mean  of 
the  probability  distribution  of  Y  of  2.1  man-hours. 

FIGURE  2.8  Meaning  of  linear  regression  parameters 
Man-Hours 


The  intercept  j30  —  9.5  indicates  the  value  of  the  regression  function  at  A  =  0. 
However,  since  the  linear  regression  model  was  formulated  to  apply  to  lot  sizes 
ranging  from  20  to  80  units,  /30  does  not  have  any  intrinsic  meaning  of  its  own. 
In  particular,  it  does  not  necessarily  indicate  the  average  setup  time  for  the 
process  (the  average  man-hours  before  actual  output  begins).  A  model  with  a 
curvilinear  regression  function  and  some  different  value  of  /30  than  that  in  the 
linear  model  might  well  be  required  if  the  scope  of  the  model  were  to  extend  to 
lot  sizes  down  to  zero. 


Alternative  versions  of  model 

Sometimes  it  is  convenient  to  write  model  (2.1)  in  somewhat  different, 
though  equivalent,  forms.  Let  X0  be  a  dummy  variable  identically  equal  to  one. 
Then,  we  can  write  (2.1)  as  follows: 

(2.5)  Yt  =  f30X0  +  PiXi  +  s,  where  X0  =  1 

Another  modification  sometimes  helpful  is  to  use  for  the  independent  variable 


2.4  Estimation  of  regression  function  /  35 


the  deviation  Xt  —  X  rather  than  Xt.  To  leave  model  (2.1)  unchanged,  we  need  to 
write: 

Yt  =  fa  +  Pi(Xj  -X)  +  e{ 

=  (fa  +  faX)  +  Pi(Xt  ~X)  +  ei 
=  fa  +  fa(Xt  -X)  +  £t 

Thus,  an  alternative  model  version  is: 

(2.6)  Yt  =  fa  +  fa(Xt  ~X)  +  Ei 

where: 

(2.6a)  fa  =  fa  +  faX 

We  shall  use  models  (2.1),  (2.5),  and  (2.6)  interchangeably  as  convenience 
dictates. 


2.4  ESTIMATION  OF  REGRESSION  FUNCTION 
Obtaining  needed  sample  data 

Ordinarily,  we  do  not  know  the  values  of  the  regression  parameters  fa  and  fa 
in  model  (2.1)  and  need  to  estimate  them  from  sample  data.  Such  sample  data 
may  be  obtained  by  experimental  or  nonexperimental  means.  We  shall  briefly 
consider  each  in  turn. 

Sometimes,  it  is  possible  to  conduct  a  controlled  experiment  to  provide  data 
from  which  the  regression  parameters  can  be  estimated.  Consider,  for  instance, 
an  insurance  company  that  wishes  to  study  the  relation  between  productivity  of 
its  clerks  in  processing  claims  and  amount  of  training.  Five  clerks  selected  at 
random  are  trained  for  two  weeks,  five  for  three  weeks,  five  for  four  weeks,  and 
five  for  five  weeks,  and  the  productivity  of  the  clerks  is  then  observed.  These 
data  on  length  of  training  (A)  and  productivity  (7)  to  serve  as  a  basis  for  estimat¬ 
ing  the  regression  parameters  are  experimental  data. 

Often  it  is  not  practical  or  feasible  to  conduct  controlled  experiments,  in 
which  case  nonexperimental  data,  also  called  observational  data,  will  need  to  be 
utilized.  Such  data  are  obtained  without  controlling  the  independent  variable  of 
interest.  For  example,  public  health  officials  wishing  to  study  the  relation  be¬ 
tween  age  of  person  (X)  and  number  of  days  of  illness  last  year  (7)  would 
probably  use  data  obtained  from  health  records  or  from  a  survey  of  the  popula¬ 
tion  since  they  cannot  assign  ages  at  random  to  persons.  Such  data  are  nonexperi¬ 
mental  data  since  the  independent  variable  is  not  controlled.  Similarly,  the  West- 
wood  Company  in  our  earlier  lot  size  example  needed  to  rely  on  nonexperimental 
data  since  the  lot  size  at  any  given  time  was  dictated  by  the  demand  for  the 
product,  which  was  not  under  the  control  of  the  company. 

Once  the  data  have  been  obtained,  either  by  experiment  or  nonexperiment- 
ally,  they  can  be  assembled  in  a  table  such  as  Table  2.1  for  the  Westwood 


36  /  Linear  regression  with  one  independent  variable 


Company  example.  We  shall  denote  the  (X,  Y )  observations  for  the  first  trial  as 
(X] ,  Y{),  for  the  second  trial  (X2,  Y2),  and  in  general  for  the  z'th  trial  (X2,  F;)  where 
i  =  1 ,...,«.  For  the  data  in  Table  2.1,  Xl  =  30,  Yt  =73,  and  so  on,  and  n  = 
10. 


TABLE  2.1  Data  on  lot  size  and  number  of  man-hours — 
Westwood  Company  example 


Production  Run 
i 

Lot  Size 

Xi 

Man-Hours 

Yt 

1 

30 

73 

2 

20 

50 

3 

60 

128 

4 

80 

170 

5 

40 

87 

6 

50 

108 

7 

60 

135 

8 

30 

69 

9 

70 

148 

10 

60 

132 

Method  of  least  squares 

To  find  “good”  estimators  of  the  regression  parameters  /30  and  (B\ ,  we  shall 
employ  the  method  of  least  squares.  For  each  sample  observation  (Xt,  Yt),  the 
method  of  least  squares  considers  the  deviation  of  Yt  from  its  expected  value: 

(2.7)  Yi  -  (ft,  +  PM 

In  particular,  the  method  of  least  squares  requires  that  we  consider  the  sum  of  the 
n  squared  deviations.  This  criterion  is  denoted  by  Q: 

(2.8)  Q  =  2  (X,  -  ft  -  frX,)2 

i=  1 

According  to  the  method  of  least  squares,  the  estimators  of  0O  and  0!  are  those 
values  b0  and  bt,  respectively,  that  minimize  the  criterion  Q  for  the  given  sample 
observations  (Xh  Yj). 

Example.  Figure  2.9a  contains  a  scatter  plot  of  the  sample  data  of  Table  2. 1 
for  the  Westwood  Company  example.  In  Figure  2.9b  is  plotted  a  fitted  regression 
line  using  the  arbitrary  estimates: 

b0  =  30 
bx  =  0 

Also  shown  in  Figure  2.9b  are  the  deviations  Y(  —  30  —  (0)X2.  Note  that  each 
deviation  corresponds  to  the  vertical  distance  between  Yt  and  the  fitted  regression 
line.  Clearly,  the  fit  is  poor.  Hence,  the  deviations  are  large  and  so  are  the 


FIGURE  2.9  Example  of  deviations  from  different  fitted  regression  lines 


Lot  Size  Lot  Size  Lot  Size 


38  /  Linear  regression  with  one  independent  variable 


squared  deviations.  The  sum  Q  of  the  squared  deviations  is  (observations  in 
ascending  order): 

Q  =  (50  -  30)2  +  (69  -  30)2  +  •  •  •  +  (170  -  30)2  =  77,660 

Figure  2.9c  contains  the  deviations  Yt  —  b0  —  b\X,  for  the  estimates  b()  =  15, 
b\  =  1.5.  Here,  the  fit  is  better  (though  still  not  good),  the  deviations  are  much 
smaller,  and  hence  the  sum  of  the  squared  deviations  is  reduced, to  Q  =  4,910. 
Thus,  a  better  fit  of  the  regression  line  to  the  data  corresponds  to  a  smaller 
sum  Q. 

The  objective  of  the  method  of  least  squares  is  to  find  estimates  b0  and  b{  for 
/30  and  /31?  respectively,  for  which  Q  is  a  minimum.  In  a  certain  sense,  to  be 
discussed  shortly,  these  estimates  will  provide  a  “good”  fit  of  the  linear  regres¬ 
sion  function. 


Least  squares  estimators.  The  estimators  bo  and  b{  which  satisfy  the  least 
squares  criterion  could  be  found  by  a  trial  and  error  procedure.  However,  this  is 
not  necessary  since  it  can  be  shown  that  the  values  b0  and  b{  which  minimize  Q 
for  any  particular  set  of  sample  data  are  given  by  the  following  simultaneous 
equations: 

(2.9a)  2Yi  =  nb0  +  ZqXX, 

(2.9b)  IXFi  =  boZXi  +  b^Xf 


The  equations  (2.9a)  and  (2.9b)  are  called  normal  equations;  bo  and  Zq  are  called 
point  estimators  of  f30  and  /3i,  respectively. 

The  quantities  XT,,  XX,,  and  so  on  in  (2.9)  are  calculated  from  the  sample 
observations  (X,-,  Y-).  The  equations  then  can  be  solved  simultaneously  for  b0  and 
b\.  Alternatively,  b()  and  b\  can  be  obtained  directly  as  follows: 


(2.10a) 


hi 


ZXiYt  ~ 


(X2Q)(XT/) 

n 


n 


X(Xf  ~  X)W  -  Y ) 
S(Zf  -  Xf 


(2.10b)  b0  =  —  (XT;-  -  b^Xd  =  Y  —  b{K. 

n 


where  X  and  Y  have  the  usual  meaning. 

Note 


The  normal  equations  (2.9)  can  be  derived  by  calculus.  For  given  sample  observations 
(Xi,  Y;) ,  the  quantity  Q  in  (2.8)  is  a  function  of  /30  and  (i\-  The  values  of  (30  and  f3\  which 
minimize  Q  can  be  derived  by  differentiating  (2.8)  with  respect  to  /30  and  (i\ ■  We  obtain: 

=  -2X(T  -P o-  PM) 
dpo 

JQ 

dfa 


2XX(;(F  —  fio  —  fi\Xi) 


2.4  Estimation  of  regression  function  /  39 


We  then  set  these  partial  derivatives  equal  to  zero,  using  b0  and  bx  to  denote  the  particular 
values  of  /30  and  /31?  respectively,  which  minimize  Q: 

-2 S(yf  -  b0  -  bx Xi)  =  0 
-2VXlYt  -bQ-  b, Xd  =  0 

Simplifying,  we  obtain: 

2  (Xi  ~bo-  biXd  =  0 

i=  1 

2™  -  bo  -  hXd  =  0 

i=i 

Expanding  out,  we  have: 

EE,  -  nb0  -  b{LXi  =  0 
EX,E,  -  -  b\XX}  =  0 

from  which  the  normal  equations  (2.9)  are  obtained  by  rearranging  terms. 

A  test  of  the  second  partial  derivatives  will  show  that  a  minimum  is  obtained  with  the 
least  squares  estimators  b0  and  b\ . 


Properties  of  least  squares  estimators.  An  important  theorem,  called  the 
Gauss-Markov  theorem,  states: 

(2. 1 1)  Under  the  conditions  of  model  (2. 1),  the  least  squares  estimators  b0 
and  bi  in  (2.10)  are  unbiased  and  have  minimum  variance  among 
all  unbiased  linear  estimators. 


This  theorem,  which  is  proven  in  the  next  chapter,  states  first  that  both  bo  and 
bi  are  unbiased  estimators.  Hence: 

E(b0)  =  fio 
E(bP  =  p1 

so  that  neither  estimator  tends  to  overestimate  or  underestimate  systematically. 

Second,  the  theorem  states  that  the  sampling  distributions  of  b0  and  b}  have 
smaller  variability  than  those  of  any  other  estimators  belonging  to  a  particular 
class  of  estimators.  Thus,  the  least  squares  estimators  are  more  precise  than  any 
of  these  other  estimators.  The  class  of  estimators  for  which  the  least  squares 
estimators  are  “best”  consists  of  all  unbiased  estimators  which  are  linear  func¬ 
tions  of  the  observations  E), . . .  ,Yn.  The  estimators  b0  and  b\  are  such  linear 
functions  of  the  E’s.  Consider,  for  instance,  b\ .  We  have  from  (2.10a): 

S(X,-  -  X)(y,  -  Y) 

1  S(X,  -  x  f 

It  will  be  shown  in  (3.5)  that  this  expression  is  equal  to: 


b\ 


S(X  -  X)¥, 
2(X  -  X  f 


40  /  Linear  regression  with  one  independent  variable 


where: 


Xt  -  X 

ki  =  — — 5 - =-y- 

X(Xt  -  X)2 

Since  the  kt  are  known  constants  (because  the  X,  are  known  constants),  bi  is  a 
linear  combination  of  the  Yt  and  hence  is  a  linear  estimator. 

In  the  same  fashion,  it  can  be  shown  that  bo  is  a  linear  estimator. 

Among  all  linear  estimators  that  are  unbiased  then,  b0  and  b\  have  the  smallest 
variability  in  repeated  samples  in  which  the  X  levels  remain  unchanged. 


Example.  To  illustrate  the  calculation  of  the  least  squares  estimators  bQ  and 
b\,  we  will  use  the  Westwood  Company  case  discussed  earlier.  The  sample  data 
are  given  in  Table  2. 1  and  plotted  in  Figure  2.9a.  Table  2.2  gives  the  basic  results 
required  to  calculate  bo  and  by .  We  have:  XYt  =  1,100,  XX  t  =  500,  XXjYl  = 
61,800,  XX2  =  28,400,  n  =  10.  Using  (2.10)  we  obtain: 


h 

t>o 


61ig00_  (500XU00) 


n 


10 


xxf 


C XXd 2 


n 


28,400  - 


(500)2 

10 


2.0 


1  ^  ^  1 

—  (LYi  -  hXXd  = - [1,100  -  2.0(500)]  =  10.0 

n  10 


Thus,  we  estimate  that  the  mean  number  of  man-hours  increases  by  2.0  hours  for 
each  unit  increase  in  lot  size. 


TABLE  2.2  Basic  calculations  to  obtain  b0  and  b — Westwood 
Company  example 


(for  later 
use) 


Yt 

X, 

XtYt 

xf 

Yf 

73 

30 

2,190 

900 

5,329 

50 

20 

1,000 

400 

2,500 

128 

60 

7,680 

3,600 

16,384 

170 

80 

13,600 

6,400 

28,900 

87 

40 

3,480 

1,600 

7,569 

108 

50 

5,400 

2,500 

11,664 

135 

60 

8,100 

3,600 

18,225 

69 

30 

2,070 

900 

4,761 

148 

70 

10,360 

4,900 

21,904 

132 

60 

7,920 

3,600 

17,424 

Total  1,100 

500 

61,800 

28,400 

134,660 

2.4  Estimation  of  regression  function  /  41 


Point  estimation  of  mean  response 

Estimated  regression  function.  Given  sample  estimators  b0  and  b\  of  the 
parameters  in  the  regression  function  (2.3): 

E(Y)  =  (30  +  faX 

it  is  natural  that  we  estimate  the  regression  function  as  follows: 

(2.12)  Y=b0  +  blX 

where  Y  (read  Y  hat)  is  the  value  of  the  estimated  regression  function  at  the  level 
X  of  the  independent  variable. 

We  will  call  a  value  of  the  response  variable  a  response  and  will  call  E(Y)  the 
mean  response.  Thus,  the  mean  response  is  the  mean  of  the  probability  distribu¬ 
tion  of  Y  corresponding  to  the  level  X  of  the  independent  variable.  Y  then  is  a 
point  estimator  of  the  mean  response  when  the  level  of  the  independent  variable 
is  X.  It  can  be  shown  as  an  extension  of  the  Gauss-Markov  theorem  (2.11)  that  Y 
is  an  unbiased  estimator  of  E(Y),  with  minimum  variance  in  the  class  of  unbiased 
linear  estimators. 

/\ 

For  the  observations  in  the  sample,  we  will  call  Yf. 

(2.13)  Yi  =  bQ  +  b1Xi  i=l,...,n 

— ,  ■A 

th  &  fitted  value  for  the  z'th  observation.  Thus,  the  fitted  value  Yt  is  to  be  viewed  in 
distinction  to  the  observed  value  Yr 

Example.  For  the  Westwood  Company  case,  we  found  that  the  least  squares 
estimates  of  the  regression  coefficients  were: 

b0  =  10.0  h  =  2.0 

Hence,  the  estimated  regression  function  is: 

Y  =  10.0  +  2. OX 

If  we  are  interested  in  the  mean  number  of  man-hours  when  the  lot  size  is 
X  =  55,  our  point  estimate  would  be: 

Y  =  10.0  +  2.0(55)  =  120 

Thus,  we  would  estimate  that  the  mean  number  of  man-hours  for  production  runs 
of  X  =  55  units  is  120.  We  interpret  this  to  mean  that  if  many  runs  of  size  55  are 
produced  under  the  conditions  of  the  10  runs  in  the  sample,  the  mean  labor  time 
for  these  many  runs  is  about  120  hours.  Of  course,  the  labor  time  for  any  one  run 
of  size  55  is  likely  to  fall  above  or  below  the  mean  response  because  of  inherent 
variability  in  the  system,  as  represented  by  the  error  term  in  the  model. 

Figure  2.10  contains  a  computer  plot  of  the  estimated  regression  function 
Y  =  10.0  +  2. OX,  as  well  as  the  original  data.  Note  the  improved  fit  of  the  least 
squares  regression  line  over  the  arbitrary  lines  in  Figure  2.9.  Indeed,  the  criterion 


42  /  Linear  regression  with  one  independent  variable 


Q  for  the  least  squares  regression  line  now  is  only  Q  =  60,  as  will  be  shown 
shortly,  a  much  smaller  value  than  the  values  of  Q  for  the  arbitrary  fitted  lines  in 
Figure  2.9. 

FIGURE  2.10  Observations  and  least  squares  regression  line  for  Westwood  Company 
example:  b0  =  10.0,  bx  =  2.0 


Fitted  values  for  the  sample  data  are  obtained  by  substituting  the  X  values  in 
the  sample  into  the  estimated  regression  equation.  For  example,  for  our  sample 
data,  Xi  =  30.  Hence,  the  fitted  value  is: 

Yx  =  10.0  +  2.0(30)  =  70 

This  compares  with  the  observed  man-hours  of  Y\  =  73.  Table  2.3  contains  all 
the  observed  and  fitted  values  for  the  Westwood  Company  data. 

Alternative  model  (2.6).  If  the  alternative  regression  model  (2.6): 

Yi  =  +  fa{Xt  -X)  +  e, 


2.4  Estimation  of  regression  function  /  43 


TABLE  2.3  Fitted  values,  residuals,  and  squared  residuals — Westwood  Company 
example 


Observation 

Number 

i 

Lot 

Size 

Xt 

Man- 

Hours 

Yt 

Estimated 

Mean 

Response 

Yt 

Residual 
(  Yt  -  ft)  =  et 

Squared 

Residual 

C Yt-ft)2  =  e? 

1 

30 

73 

70 

+  3 

9 

2 

20 

50 

50 

0 

0 

3 

60 

128 

130 

-2 

4 

4 

80 

170 

170 

0 

0 

5 

40 

87 

90 

-3 

9 

6 

50 

108 

110 

-2 

4 

7 

60 

135 

130 

+  5 

25 

8 

30 

69 

70 

-1 

1 

9 

70 

148 

150 

-2 

4 

10 

60 

132 

130 

+  2 

4 

Total 

500 

1,100 

1,100 

0 

60 

is  to  be  utilized,  the  least  squares  estimator  bA  of  /3,  is  the  same  as  before.  The 
least  squares  estimator  of  (3q  =  (30  +  (3xX  is,  using  (2.10b): 

(2. 14)  b*  =  b0  +  bxX  =  (F  -  bxX)  +  bxX=Y 

Hence,  the  estimated  regression  equation  for  alternative  model  (2.6)  is: 

(2.15)  Y=Y+bl(X-X ) 

In  our  Westwood  Company  example,  Y  =  1,100/10  =110  and  X  =  500/10 
=  50  (Table  2.2).  Hence,  the  estimated  regression  equation  in  alternative  form 
is: 

F=  110.0  +  2.0(X-  50) 

For  our  sample  data,  Xx  =  30;  hence,  we  estimate  the  mean  response  to  be: 

Yx  =  110.0  +  2.0(30  -  50)  =  70 
which,  of  course,  is  identical  to  our  earlier  result. 


Residuals 

The  zth  residual  is  the  difference  between  the  observed  value  Y,  and  the  corre¬ 
sponding  fitted  value  F,-.  Denoting  this  residual  by  en  we  can  write: 

(2.16)  ei=Yi-Yi  =  Yi-b0-b1Xi 

Figure  2.11  shows  the  10  residuals  for  the  Westwood  Company  example.  The 
magnitudes  of  the  residuals  are  shown  by  the  vertical  lines  between  an  observa¬ 
tion  and  the  fitted  value  on  the  estimated  regression  line.  The  residuals  are  calcu¬ 
lated  in  Table  2.3  above. 


44  /  Linear  regression  with  one  independent  variable 


FIGURE  2.11  Least  squares  regression  line  and  residuals — Westwood 

Company  example  (observed  values  and  residuals  not  plotted 
to  scale) 


Man-Hours 


We  need  to  distinguish  between  the  model  error  term  value  e(  =  Yt  —  E(Yt) 
and  the  residual  et  —  Yt  —  Y{.  The  former  involves  the  vertical  deviation  of  Y, 
from  the  unknown  population  regression  line,  and  hence  is  unknown.  On  the 
other  hand,  the  residual  is  the  observed  vertical  deviation  of  Yt  from  the  fitted 
regression  line. 

Residuals  are  highly  useful  for  studying  whether  a  given  regression  model  is 
appropriate  for  the  data  at  hand.  We  shall  discuss  this  use  in  Chapter  4. 

Properties  of  fitted  regression  line 

The  regression  line  fitted  by  the  method  of  least  squares  has  a  number  of 
properties  worth  noting: 


2.4  Estimation  of  regression  function  /  45 


1.  The  sum  of  the  residuals  is  zero: 

n 

(2.17)  2>  =  0 

1=1 

This  property  can  be  proven  easily.  We  have: 

2e,  =  2(7,  -bo~  hXd 

=  2  Yj  —  nb0  —  bpLXi  =  0 

by  the  first  normal  equation  (2.9a).  Table  2.3  illustrates  this  property  for  our 
earlier  example.  Rounding  errors  may,  of  course,  be  present  in  any  particular 
case. 

2.  The  sum  of  the  squared  residuals,  2e?,  is  a  minimum.  This  was  the  re¬ 
quirement  to  be  satisfied  in  deriving  the  least  squares  estimators  of  the  regression 
parameters. 

3.  The  sum  of  the  observed  values  7;  equals  the  sum  of  the  fitted  values  Yg 

(2.18)  2^=2^ 

i=  1  i=  1 

This  condition  is  implicit  in  the  first  normal  equation  (2.9a): 

2  Yi  =  nb0  +  b^Xi 

=  260  +  2  biXi  =  2  {b0  +  biXd  =  2  % 

/\ 

It  follows  from  (2.18)  that  the  mean  of  the  Y(-  is  the  same  as  the  mean  of  the  Yh 
namely,  Y. 

4.  The  sum  of  the  weighted  residuals  is  zero  when  the  residual  in  the  zth  trial 
is  weighted  by  the  level  of  the  independent  variable  in  the  z'th  trial: 

(2.19)  i>f  =  ° 

i—  1 

This  follows  from  the  second  normal  equation  (2.9b): 

23fo  =  2 Xfft  -b0-  bxXd 

=  2 XtYt  -  boZXi  -  b{XXj  =  0 

5.  The  sum  of  the  weighted  residuals  is  zero  when  the  residual  in  the  ith  trial 
is  weighted  by  the  fitted  value  of  the  response  variable  for  the  zth  trial: 

(2.20)  2  fie<  =  0 

i=\ 

This  property  is  a  consequence  of  (2.17)  and  (2.19). 

6.  The  regression  line  always  goes  through  the  point  (X,  7).  This  can  be 
readily  seen  from  the  alternative  form  of  the  estimated  regression  line  in  (2.15). 
If  X  =  X,  we  have: 


46  /  Linear  regression  with  one  independent  variable 


Y=Y  +  bfX  -X)  =  Y+  bfX  -X)  =  Y 
Figure  2.11  demonstrates  this  property  for  our  lot  size  example. 

2.5  ESTIMATION  OF  ERROR  TERMS  VARIANCE  a2 

The  variance  cr2  of  the  error  terms  e,  in  the  regression  model  (2.1)  needs  to  be 
estimated  for  a  variety  of  purposes.  Frequently,  we  would  like  to  obtain  an 
indication  of  the  variability  of  the  probability  distributions  of  Y.  Further,  as  we 
shall  see  in  the  next  chapter,  a  variety  of  inferences  concerning  the  regression 
function  and  the  prediction  of  Y  require  an  estimate  of  cr2. 

Point  estimator  of  cr2 

Single  population.  To  lay  the  basis  for  developing  an  estimator  of  a2  for  the 
regression  model  (2.1),  let  us  consider  for  a  moment  the  simpler  problem  of 
sampling  from  a  single  population.  In  obtaining  the  sample  variance  s2,  we  begin 
by  considering  the  deviation  of  an  observation  Yl  from  the  estimated  mean  Y, 
squaring  it,  and  then  summing  all  such  deviations: 

i(r,-F)2  ■ 

i=  1 

Such  a  sum  is  called  a  sum  of  squares.  The  sum  of  squares  is  then  divided  by  the 
degrees  of  freedom  associated  with  it.  This  number  is  n  —  1  here,  because  one 
degree  of  freedom  is  lost  by  using  the  estimate  Y  instead  of  the  population  mean 
fx.  The  resulting  estimator  is  the  usual  sample  variance: 

2  (Y  -  Y)2 

s2  =  ^ - 

n  —  1 

which  is  an  unbiased  estimator  of  the  variance  cr2  of  an  infinite  population.  The 
sample  variance  is  often  called  a  mean  square ,  because  a  sum  of  squares  has 
been  divided  by  the  appropriate  number  of  degrees  of  freedom. 

Regression  model.  The  logic  of  developing  an  estimator  of  cr2  for  the  re¬ 
gression  model  is  the  same  as  when  sampling  a  single  population.  Recall  in  this 
connection  from  (2.4)  that  the  variance  of  each  observation  Yt  is  or2,  the  same  as 
that  of  each  error  term  er  We  again  need  to  calculate  a  sum  of  squared  devia¬ 
tions,  but  must  recognize  that  the  Y{  come  from  different  probability  distributions 
with  different  means,  depending  upon  the  level  Xt.  Thus,  the  deviation  of  an 
observation  T)  must  be  calculated  around  its  own  estimated  mean  Yt.  Hence,  the 
deviations  are  the  residuals: 


Yi~Yi  =  ei 


2.5  Estimation  of  error  terms  variance  a2  /  47 


and  the  appropriate  sum  of  squares,  denoted  by  SSE,  is: 


(2.21) 


SSE  =  2  (Y,  -  Y,f  =  2  (Y,  ~  b0  -  bj,)2  =  2  «? 


i=l 


1=1 


i=l 


where  S&E1  stands  for  error  sum  of  squares  or  residual  sum  of  squares. 

The  sum  of  squares  SSE  has  n  —  2  degrees  of  freedom  associated  with  it.  Two 
degrees  of  freedom  are  lost  because  both  (3o  and  (3\  had  to  be  estimated  in 
obtaining  Yh  Hence,  the  appropriate  mean  square,  denoted  by  MSE,  is: 


(2.22) 


MSE  = 


SSE 
n  —  2 


S(7,  ~  Yd2 

n  —  2 


-b0-  biXj)2  _  lef 
n  —  2  n  —  2 


where  MSE  stands  for  error  mean  square  or  residual  mean  square. 

It  can  be  shown  that  MSE  is  an  unbiased  estimator  of  <x2  for  the  regression 
model  (2.1): 

(2.23)  EiMSE)  =  a2 

An  estimator  of  the  standard  deviation  cr  is  simply  the  positive  square  root  of 
MSE. 


Alternative  computational  formulas 

There  are  a  number  of  alternative  computational  formulas  for  SSE.  Three  of 
these  are  as  follows: 


(2.24a) 

(2.24b) 

(2.24c) 


SSE  =  2Yf  -  bolYi 
SSE  =  2(Y*  -  F)2  - 


-  b^Xfi 

g(X,-  -  X)(Xi-  Y )]2 
2(X,  -  X )2 


SSE 


n 


Comments 

1.  Formula  (2.24a)  is  useful  if  bo  and  b]  have  already  been  calculated.  Otherwise, 
(2.24b)  and  (2.24c)  are  more  direct. 

2.  In  (2.24a),  the  estimates  b0  and  b\  should  be  carried  to  a  large  number  of  digits  in 
order  to  yield  reliable  results  for  SSE. 

3.  To  obtain  (2.24a),  recall  that  by  (2.21)  we  have: 

SSE  =  2(F  -b0-  hXd2 


48  /  Linear  regression  with  one  independent  variable 


Thus: 

SSE  =  ST?  -  2 b^Yi  -  2b{2Xfo  +  nb\  +  Ib^YXi  +  b\lXf 
=  2F?  -  2 b&Yi  ~  2 bxlXfo  +  b0(nb0  + 

+  b,(boZXi  +  feiSX?) 

The  expressions  in  parentheses  are  equal  to  ST,-  and  2X,F(,  respectively,  by  the  normal 
equations  (2.9).  Substituting  these  terms  within  the  parentheses  yields  an  expression 
which  reduces  directly  to  (2.24a). 

4.  None  of  the  three  alternative  formulas  explicitly  provides  the  residuals  et.  As  noted 
earlier,  the  residuals  are  useful  for  studying  the  appropriateness  of  the  model. 


Example 

Returning  to  our  Westwood  Company  lot  size  example,  we  will  calculate  SSE 
by  (2.21).  The  residuals  were  obtained  earlier  in  Table  2.3.  This  table  also  shows 
the  squared  residuals.  From  these  results,  we  obtain: 

SSE  =  60 

Since  10  —  2  =  8  degrees  of  freedom  are  associated  with  SSE,  we  find: 

60 

MSE  = - =  7.5 

8 

Finally,  a  point  estimate  of  a,  the  standard  deviation  of  the  probability  distribu¬ 
tion  of  Y  for  any  X,  is  V7.5  =  2.74  man-hours. 

Consider  again  the  case  where  the  lot  size  is  X  =  55  units.  We  estimated 
earlier  that  the  probability  distribution  of  Y  for  this  lot  size  has  a  mean  of  120 
man-hours.  Now,  we  have  the  additional  information  that  the  standard  deviation 
of  this  distribution  is  estimated  to  be  2.74  man-hours. 

If  we  wished  to  use,  say,  (2.24a)  for  calculating  SSE,  we  would  need  2F?. 
This  sum  is  calculated  in  Table  2.2.  We  then  obtain,  using  the  results  in  Table  2.2 
and  the  estimates  b0  =  10.0,  =  2.0: 

SSE  =  XYj  -  b02Yt  -  b{2X¥i 

=  134,660  -  10.0(1,100)  -  2.0(61,800)  =  60 

which  is,  of  course,  the  same  result  (except  sometimes  for  rounding  errors)  as 
obtained  earlier. 


2.6  NORMAL  ERROR  REGRESSION  MODEL 

No  matter  what  may  be  the  functional  form  of  the  distribution  of  e,  (and  hence 
of  Y)),  the  least  squares  method  provides  unbiased  point  estimators  of  f30  and  fii 
which  have  minimum  variance  among  all  unbiased  linear  estimators.  To  set  up 
interval  estimates  and  make  tests,  however,  we  do  need  to  make  an  assumption 
about  the  functional  form  of  the  distribution  of  the  e;  .  The  standard  assumption  is 
that  the  error  terms  are  normally  distributed,  and  we  will  adopt  it  here.  A  normal 


2.6  Normal  error  regression  model  /  49 


error  term  greatly  simplifies  the  theory  of  regression  analysis  and  is  justifiable  in 
many  real  world  situations  where  regression  analysis  is  applied. 


Normal  error  model 

The  normal  error  model  is  as  follows: 

(2.25)  Yt  =  p0  +  fax,  +  &i 

where: 

Yt  is  the  observed  response  in  the  rth  trial 

Xi  is  a  known  constant,  the  level  of  the  independent  variable  in  the  zth 
trial 

j30  and  (3i  are  parameters 
g,  are  independent  N(0,  cr 2) 
i  =  1 , ,n 

Comments 

1.  The  symbol  N(0,  cr2)  stands  for  “normally  distributed,  with  mean  0  and  variance 
o-2.” 

2.  The  normal  error  model  (2.25)  is  the  same  as  the  regression  model  (2.1)  with 
unspecified  error  distribution,  except  that  model  (2.25)  assumes  that  the  errors  et  are 
normally  distributed. 

3.  Because  model  (2.25)  assumes  that  the  errors  are  normally  distributed,  the  assump¬ 
tion  of  uncorrelatedness  of  the  et  in  model  (2.1)  becomes  one  of  independence  in  the 
normal  error  model. 

4.  Model  (2.25)  implies  that  the  Yt  are  independent  normal  random  variables,  with 
mean  E(Yi)  =  /30  +  (i\Xj  and  variance  cr2.  Figure  2.4  (p.  27)  pictures  this  normal  error 
model.  Each  of  the  probability  distributions  of  Y  there  is  normally  distributed,  with  con¬ 
stant  variability,  and  the  regression  function  is  linear. 

5.  A  major  reason  why  the  normality  assumption  for  the  error  terms  is  justifiable  in 
many  situations  is  that  the  error  terms  frequently  represent  the  effects  of  many  factors 
omitted  explicitly  from  the  model,  which  do  affect  the  response  to  some  extent  and  which 
vary  at  random  without  reference  to  the  independent  variable  X.  For  instance,  in  the  lot 
size  example,  such  factors  as  time  lapse  since  the  last  production  run,  particular  machines 
used,  season  of  the  year,  and  personnel  employed,  could  vary  more  or  less  at  random 
from  run  to  run,  independent  of  lot  size.  Also,  there  might  be  random  measurement  errors 
in  recording  Y.  Insofar  as  these  random  effects  have  a  degree  of  mutual  independence,  the 
composite  error  term  s,-  representing  all  these  factors  would  tend  to  comply  with  the 
central  limit  theorem  and  the  error  term  distribution  would  approach  normality  as  the 
number  of  factor  effects  becomes  large. 

A  second  reason  why  the  normality  assumption  for  the  error  terms  is  frequently  justifi¬ 
able  is  that  the  estimation  and  testing  procedures  to  be  discussed  in  the  next  chapter  are 
based  on  the  t  distribution,  which  is  not  sensitive  to  moderate  departures  from  normality. 
Thus,  unless  the  departures  from  normality  are  serious,  particularly  with  respect  to  skew¬ 
ness,  the  actual  confidence  coefficients  and  risks  of  errors  will  be  close  to  the  levels  for 
exact  normality. 


50  /  Linear  regression  with  one  independent  variable 


Estimation  of  parameters  by  method  of  maximum  likelihood 


When  the  functional  form  of  the  probability  distribution  of  the  error  terms  is 
specified,  estimators  of  the  parameters  /30,  j3l5  and  a2  can  be  obtained  by  the 
method  of  maximum  likelihood.  This  method  utilizes  the  joint  probability  distri¬ 
bution  of  the  sample  observations.  When  this  joint  probability  distribution  is 
viewed  as  a  function  of  the  parameters,  given  the  particular  sample  observations, 
it  is  called  the  likelihood  function.  The  likelihood  function  for  the  normal  error 
model  (2.25),  given  the  sample  observations  Y\, . . .  ,Yn,  is: 


(2.26)  L((30,  (Bu  a2)  =  fl 


1 


M  (2  770-2) 


.20/2 


exp 


2o" 


<Y,  -  ft,  -  PM2 


(2^r exp 


7TT  i  Vi  -Po-  PiXd2 

2 o-2  -tl 


The  values  of  (30 ,  fti,  and  cr2  which  maximize  this  likelihood  function  are  the 
maximum  likelihood  estimators.  These  are: 


Parameter  Maximum  Likelihood  Estimator 


fio  b0 

(2.27)  ft  b, 


Thus,  the  maximum  likelihood  estimators  of  (30  and  /3i  are  the  same  estimators  as 
provided  by  the  method  of  least  squares.  The  maximum  likelihood  estimator  a2 
is  biased,  and  ordinarily  the  unbiased  estimator  MSE  as  given  in  (2.22)  is  used. 
Note  that  the  unbiased  estimator  MSE  differs  but  slightly  from  the  maximum 
likelihood  estimator  dr2,  especially  if  n  is  not  small: 

(2.28)  MSE  =  — - — a2 

n  —  2 

Comments 

1 .  Since  the  maximum  likelihood  estimators  bo  and  b\  are  the  same  as  the  least  squares 
estimators,  they  have  the  properties  of  all  least  squares  estimators: 

a.  They  are  unbiased. 

b.  They  have  minimum  variance  among  all  unbiased  linear  estimators. 

In  addition,  the  maximum  likelihood  estimators  b0  and  b{  for  the  normal  error  model 
(2.25)  have  other  desirable  properties: 

c.  They  are  consistent,  as  defined  in  (1.47). 

d.  They  are  sufficient,  as  defined  in  (1.48). 

e.  They  are  minimum  variance  unbiased;  i.e. ,  they  have  minimum  variance  in  the  class 
of  all  unbiased  estimators  (linear  or  otherwise). 


same  as  (2.10b) 
same  as  (2.10a) 

m  -  Yt)2 

n 


2.7  Computer  inputs  and  outputs  /  51 


Thus,  for  the  normal  error  model,  the  estimators  b0  and  b\  have  many  desirable 
properties. 

2.  We  find  the  values  of  p0,  Pi,  and  cr2  which  maximize  the  likelihood  function  L  in 
(2.26)  by  taking  partial  derivatives  of  L  with  respect  to  /30,  Pi,  and  a 2 ,  equating  each  of 
the  partials  to  zero,  and  solving  the  system  of  equations  thus  obtained.  We  can  work  with 
loge  L,  rather  than  L,  because  both  L  and  loge  L  are  maximized  for  the  same  values  of  j30, 
Pi,  and  cr2\ 

(2.29)  loge  L  =  -  y  loge  2t t  -  yloge  a2  -  yy 2(F,  -  ft,  -  P\Xd2 


Partial  differentiation  of  this  logarithmic  likelihood  is  much  easier;  it  yields: 


d(loge  L) 

dPo 

d(loge  L) 
dPi 


=  -4e(P  -  p0-  PiX.) 

cr 

-  -4-sx,-(^  -  A)  -  PiXi) 

cr 


9(logg  L) 

da2 


2  a1 


+  -  A>  -  M)2 


We  now  set  these  partial  derivatives  equal  to  zero,  replacing  p0,  Pi,  and  cr2  by  the 
estimators  b0,  bi,  and  a2.  We  obtain  after  some  simplification: 


(2.30a) 

(2.30b) 

(2.30c) 


Wi  -bo-  biXd  =  o 


YXiiY,  -b0-  biXd  =  0 

£(P  -b0-  biXf  _  &2 
n 


Formulas  (2.30a)  and  (2.30b)  are  identical  to  the  earlier  least  squares  normal  equations 
(2.9),  and  (2.30c)  is  the  biased  estimator  of  a2  given  earlier  in  (2.27). 


2.7  COMPUTER  INPUTS  AND  OUTPUTS 

Regression  calculations  used  to  be  tedious  chores,  especially  when  the  num¬ 
ber  of  observations  was  large  and  when  there  were  several  independent  variables. 
Today  computers  can  be  used  quite  easily  to  perform  regression  calculations, 
with  one  of  many  available  packaged  programs.  Also,  a  number  of  calculators 
contain  regression  routines. 

The  inputting  of  data  will  vary  from  program  to  program.  With  some,  the  X 
and  Y  observations  are  entered  as  separate  sets.  For  our  Westwood  Company  data 
in  Table  2.1,  one  form  of  data  input  would  be: 

X,.  Y, 

30  73 

20  50 

etc.  etc. 

In  some  other  cases,  the  inputting  of  the  data  would  be  in  the  form:  Xj,  Y\,  X2, 
Y2,  etc.  For  our  example,  this  input  form  would  be:  30,  73,  20,  50,  etc. 


52  /  Linear  regression  with  one  independent  variable 


The  computer  output  will  also  vary  from  one  program  package  to  another. 
Figure  2.12  illustrates  a  typical  output  format  when  the  linear  regression  model  is 
fitted  to  the  Westwood  Company  data  in  Table  2.1  by  the  SPSS  computer  pro¬ 
gram  (Ref.  2.1).  The  computed  values  of  b0,  b1,  and  'VMSE  are  annotated  in 
Figure  2.12,  and  agree  with  our  earlier  results.  In  subsequent  chapters,  we  shall 
explain  the  additional  output  in  Figure  2.12. 


FIGURE  2.12  Segment  of  computer  output  for  regression  run  on  Westwood  Company 
data  (SPSS,  Ref.  2.1) 

VARIABLES 

1  SIZE  2  HOURS 


30.0000 

73.0000 

20.0000 

50.0000 

60.0000 

128.0000 

80.0000 

170.0000 

40.0000 

87.0000 

50 . 0000 

108.0000 

60.0000 

135.0000 

30.0000 

69.0000 

70.0000 

148.0000 

60.0000 

132.0000 

DEPENDENT  VARIABLE. .  HOURS 


VAR  I  ABLE ( S)  ENTERED  ON  STEP  NUMBER  1..  SIZE 


MULTIPLE  R 
R  SQUARE 

STANDARD  ERROR 


0.99780 

0.99561 

2.73861 • 
VARIABLES 


t — \/MSE 
IN  THE  EQUATION 


VARIABLE 

B  «, 

STD 

ERROR  B 

F 

SIZE 

2.000000  4 — 

0.04697 

1813.333 

(CONSTANT) 

10.00000  <—  bQ 

VAR  1  ABLE 

MEAN 

STANDARD  DEV 

CASES 

SIZE 

50.0000 

19.4365 

10 

HOURS 

110.0000 

38.9587 

10 

ANALYSIS  OF 

VARIANCE  DF 

SUM  OF  SQUARES 

MEAN  SQUARE 

REGRESSION 

1 . 

13600.00000 

13600.00000 

RESIDUAL 

8. 

60.00000 

7.50000 

PROBLEMS 

2.1.  Refer  to  the  sales  volume  example  on  page  24.  Suppose  that  the  number  of  units 
sold  is  measured  accurately  but  clerical  errors  are  frequently  made  in  determining 
the  dollar  sales.  Would  the  relation  between  the  number  of  units  sold  and  dollar 
sales  still  be  a  functional  one?  Discuss. 

2.2.  The  members  of  a  health  spa  pay  annual  membership  dues  of  $300  plus  a  charge 
of  $2  for  each  visit  to  the  spa.  Let  Y  denote  the  total  dollar  cost  for  the  year  for  a 


Problems  /  53 


member  and  X  the  number  of  visits  by  the  member  during  the  year.  Express  the 
relation  between  X  and  Y  mathematically.  Is  it  a  functional  or  a  statistical  relation? 

2.3.  Experience  with  a  certain  type  of  plastic  indicates  that  a  relation  exists  between 
the  hardness  (measured  in  Brinell  units)  of  items  molded  from  the  plastic  (F)  and 
the  elapsed  time  since  termination  of  the  molding  process  (X).  It  is  proposed  to 
study  this  relation  by  means  of  regression  analysis.  A  participant  in  the  discussion 
objects,  pointing  out  that  the  hardening  of  the  plastic  “is  the  result  of  a  natural 
chemical  process  that  doesn’t  leave  anything  to  chance,  so  the  relation  must  be 
mathematical  and  regression  analysis  is  not  appropriate.”  Evaluate  this  objection. 

2.4.  In  Table  2. 1 ,  the  lot  size  X  is  the  same  in  production  runs  1  and  8  but  the  man¬ 
hours  Y  differ.  What  feature  of  regression  model  (2.1)  is  illustrated  by  this? 

2.5.  When  asked  to  state  the  simple  linear  regression  model,  a  student  wrote  it  as 
follows:  E(Yi)  —  Po  +  P\Xt  +  e,-.  Do  you  agree? 

2.6.  Consider  the  normal  error  regression  model  (2.25).  Suppose  that  the  parameter 
values  are  p0  =  200,  Pi  —  5.0,  and  cr  =  4. 

a.  Plot  this  normal  error  regression  model  in  the  fashion  of  Figure  2.7 .  Show  the 
distributions  of  F  for  X  =  10,  20,  and  40. 

b.  Explain  the  meaning  of  the  parameters  /30  and  p{.  Assume  that  the  scope  of 
the  model  includes  X  =  0. 

2.7.  In  a  simulation  exercise,  regression  model  (2.1)  applies  with  p0  =  100,  Pi  =  20, 
and  cr2  =  25.  An  observation  on  F  will  be  made  for  X  =  5. 

a.  Can  you  state  the  exact  probability  that  F  will  fall  between  195  and  205? 
Explain. 

b.  If  the  normal  error  regression  model  (2.25)  is  applicable,  can  you  now  state 
the  exact  probability  that  F  will  fall  between  195  and  205?  If  so,  state  it. 

2.8.  In  Figure  2.7,  suppose  another  observation  were  obtained  atX  =  45.  Would  E(Y) 
for  this  new  observation  still  be  104?  Would  the  F  value  for  this  new  observation 
again  be  108? 

2.9.  A  student  in  accounting  enthusiastically  declared:  ‘  ‘Regression  is  a  very  powerful 
tool.  We  can  isolate  fixed  and  variable  costs  by  fitting  a  linear  regression  model, 
even  when  we  have  no  data  for  small  lots.”  Discuss. 

2.10.  An  analyst  in  a  large  corporation  studied  the  relation  between  current  annual 
salary  (F)  and  age  (X)  for  the  46  computer  programmers  presently  employed  in 
the  company.  She  concluded  that  the  relation  is  curvilinear,  reaching  a  maximum 
at  47  years.  Does  this  imply  that  the  salary  for  a  programmer  increases  until  age 
47  and  then  decreases?  Explain. 

2.11.  The  regression  function  relating  production  output  by  an  employee  after  taking  a 
training  program  (F)  to  the  production  output  before  the  training  program  (X)  is 
E(Y)  =  20  +  .95X,  where  X  ranges  from  40  to  100.  An  observer  concludes  that 
the  training  program  does  not  raise  production  output  on  the  average  because  Pi  is 
not  greater  than  1.0.  Comment. 

2.12.  Evaluate  the  following  statement:  “For  the  least  squares  method  to  be  fully  valid, 
it  is  required  that  the  distribution  of  F  is  normal.” 

2.13.  A  person  states  that  b0  and  b{  in  the  fitted  regression  equation  (2.12)  can  be 
estimated  by  the  method  of  least  squares.  Comment. 


54  /  Linear  regression  with  one  independent  variable 


2.14.  According  to  (2. 17),  2e,-  —  0  when  model  (2. 1)  is  fitted  to  a  set  of  n  observations 
by  the  method  of  least  squares.  Is  it  also  true  that  2s,  =  0?  Comment. 

2.15.  Grade  point  average.  The  director  of  admissions  of  a  small  college  adminis¬ 
tered  a  newly  designed  entrance  test  to  20  students  selected  at  random  from  the 
new  freshman  class  in  a  study  to  determine  whether  a  student’s  grade  point  aver¬ 
age  (GPA)  at  the  end  of  the  freshman  year  (Y)  can  be  predicted  from  the  entrance 
test  score  (X).  The  results  of  the  study  follow.  Assume  that  the  first-order  regres¬ 
sion  model  (2.1)  is  appropriate. 


i:  1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Xf:  5.5 

4.8 

4.7 

3.9 

4.5 

6.2 

6.0 

5.2 

4.7 

4.3 

Yt:  3.1 

2.3 

3.0 

1.9 

2.5 

3.7 

3.4 

2.6 

2.8 

1.6 

i:  11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

Xt\  4.9 

5.4 

5.0 

6.3 

4.6 

4.3 

5.0 

5.9 

4.1 

4.7 

K:  2.0 

2.9 

2.3 

3.2 

1.8 

1.4 

2.0 

3.8 

2.2 

1.5 

Summary  calculational  results  are:  2X,-  =  100.0,  2T,-  =  50.0,  2X2  =  509.12, 
2 Yj  =  134.84,  2X,T;-  =  257.66. 

a.  Obtain  the  least  squares  estimates  of  /30  and  /3i,  and  state  the  estimated 
regression  function. 

b.  Plot  the  estimated  regression  function  and  the  data.  Does  the  estimated  re¬ 
gression  function  appear  to  fit  the  data  well? 

c.  Obtain  a  point  estimate  of  the  mean  freshman  GPA  when  the  entrance  test 
score  is  X  =  5.0. 

d.  What  is  the  point  estimate  of  the  change  in  the  mean  response  when  the 
entrance  test  score  increases  by  one  point? 

2.16.  Calculator  maintenance.  The  Tri-City  Office  Equipment  Corporation  sells  an 
imported  desk  calculator  on  a  franchise  basis  and  performs  preventive  mainte¬ 
nance  and  repair  service  on  this  calculator.  The  data  below  have  been  collected 
from  18  recent  calls  on  users  to  perform  routine  preventive  maintenance  service; 
for  each  call,  X  is  the  number  of  machines  serviced  and  Y  is  the  total  number  of 
minutes  spent  by  the  service  person.  Assume  that  the  first-order  regression  model 
(2.1)  is  appropriate. 

i:  1  2  3  4  5  6  789 

X-.  1  6  5  1  5  4  7  3  4 

yt:  97  86  78  10  75  62  101  39  53 

i:  10  11  12  13  14  15  16  17  18 

X, -:  2  8  5  2  5  7  1  4  5 

Y, :  33  118  65  25  71  105  17  49  68 

Summary  calculational  results  are:  2^=1,152,  2X,  =  81,  2(7,-  -  F)2  = 
16,504,  2(X,-  -  X)2  =  74.5,  2(Xf  -  X)(E  -  F)  =  1,098. 

a.  Obtain  the  estimated  regression  function. 

b.  Plot  the  estimated  regression  function  and  the  data.  How  well  does  the  esti¬ 
mated  regression  function  fit  the  data? 

c.  Interpret  b0  in  your  estimated  regression  function.  Does  b0  provide  any  rele¬ 
vant  information  here?  Explain. 


Problems  /  55 


d.  Obtain  a  point  estimate  of  the  mean  service  time  when  X  —  5  machines  are 
serviced. 

2.17.  Airfreight  breakage.  A  substance  used  in  biological  and  medical  research  is 
shipped  by  airfreight  to  users  in  cartons  of  1,000  ampules.  The  data  below,  in¬ 
volving  10  shipments,  were  collected  on  the  number  of  times  the  carton  was 
transferred  from  one  aircraft  to  another  over  the  shipment  route  (X)  and  the  num¬ 
ber  of  ampules  found  to  be  broken  upon  arrival  (F).  Assume  that  the  first-order 
regression  model  (2.1)  is  appropriate. 

i:  1  2  3  4  5  6  7  8  9  10 

Xf.  1  0  2  0  3  1  0  1  2  0 

Yf.  16  9  17  12  22  13  8  15  19  11 

a.  Obtain  the  estimated  regression  function.  Plot  the  estimated  regression  func¬ 
tion  and  the  data.  Does  a  linear  regression  function  appear  to  give  a  good  fit 
here? 

b.  Obtain  a  point  estimate  of  the  expected  number  of  broken  ampules  when 
X  =  1  transfer  is  made. 

c.  Estimate  the  increase  in  the  expected  number  of  ampules  broken  when  there 
are  2  transfers  as  compared  to  1  transfer. 

d.  Verify  that  your  fitted  regression  line  goes  through  the  point  (X,  F). 

2.18.  Plastic  hardness.  Refer' to  Problem  2.3.  Twelve  batches  of  the  plastic  were 
made,  and  from  each  batch  one  test  item  was  molded  and  the  hardness  measured 
at  some  specific  point  in  time.  The  results  are  shown  below;  X  is  elapsed  time  in 
hours,  and  F  is  hardness  in  Brinell  units.  Assume  that  the  first-order  regression 
model  (2.1)  is  appropriate. 

i:  1  2  3  4  5  6  7  8  9  10  11  12 

Xf.  32  48  72  64  48  16  40  48  48  24  80  56 

Yf.  230  262  323  298  255  199  248  279  267  214  359  305 

a.  Obtain  the  estimated  regression  function.  Plot  the  estimated  regression  func¬ 
tion  and  the  data.  Does  a  linear  regression  function  appear  to  give  a  good  fit 
here? 

b.  Obtain  a  point  estimate  of  the  mean  hardness  when  X  =  48  hours. 

c.  Obtain  a  point  estimate  of  the  change  in  mean  hardness  when  X  increases  by 
one  hour. 

2.19.  Refer  to  Grade  point  average  Problem  2.15. 

a.  Obtain  the  residuals  e{.  Do  they  sum  to  zero  in  accord  with  (2.17)? 

b.  Estimate  a 2  and  cr.  In  what  units  is  cr  expressed? 

2.20.  Refer  to  Calculator  maintenance  Problem  2.16. 

a.  Obtain  the  residuals  e,  and  the  sum  of  the  squared  residuals  he}.  What  is  the 
relation  between  the  sum  of  the  squared  residuals  here  and  the  quantity  Q  in 
(2.8)? 

b.  Obtain  point  estimates  of  cr2  and  cr.  In  what  units  is  cr  expressed? 

2.21.  Refer  to  Airfreight  breakage  Problem  2.17. 

a.  Obtain  the  residual  for  the  first  observation.  What  is  its  relation  to  £i? 

b.  Compute  he}  and  MSE.  What  is  estimated  by  MSE1 


56  /  Linear  regression  with  one  independent  variable 


2.22.  Refer  to  Plastic  hardness  Problem  2.18. 

a.  Obtain  the  residuals  er  Do  they  sum  to  zero  in  accord  with  (2.17)? 

b.  Estimate  cr2  and  cr.  In  what  units  is  cr  expressed? 

2.23.  Muscle  mass.  A  person’s  muscle  mass  is  expected  to  decrease  with  age.  To 
explore  this  relationship  in  women,  a  nutritionist  randomly  selected  four  women 
from  each  10-year  age  group,  beginning  with  age  40  and  ending  with  age  79.  The 
results  follow;  X  is  age,  and  ¥  is  a  measure  of  muscle  mass.  Assume  that  the 
first-order  regression  model  (2.1)  is  appropriate. 


r. 

1 

2 

3 

4 

5 

6 

7 

8 

X,: 

71 

64 

43 

67 

56 

73 

68 

56 

Y{. 

82 

91 

100 

68 

87 

73 

78 

80 

i: 

9 

10 

11 

12 

13 

14 

15 

16 

X,: 

76 

65 

45 

58 

45 

53 

49 

78 

Y,: 

65 

84 

116 

76 

97 

100 

105 

77 

a.  Obtain  the  estimated  regression  function.  Plot  the  estimated  regression  func¬ 
tion  and  the  data.  Does  a  linear  regression  function  appear  to  give  a  good  fit 
here?  Does  your  plot  support  the  anticipation  that  muscle  mass  decreases  with 
age? 

b.  Obtain  the  following:  (1)  a  point  estimate  of  the  difference  in  the  mean  mus¬ 
cle  mass  for  women  differing  in  age  by  one  year,  (2)  a  point  estimate  of  the 
mean  muscle  mass  for  women  aged  X  =  60  years,  (3)  the  value  of  the  resid¬ 
ual  for  the  eighth  observation,  (4)  a  point  estimate  of  cr2. 

2.24.  Robbery  rate.  A  criminologist  studying  the  relationship  between  population 
density  and  robbery  rate  in  medium-sized  U.S.  cities  collected  the  following  data 
for  a  random  sample  of  16  cities;  X  is  the  population  density  of  the  city  (number  of 
people  per  unit  area),  and  Y  is  the  robbery  rate  last  year  (number  of  robberies  per 
100,000  people).  Assume  that  the  first-order  regression  model  (2.1)  is  appropri¬ 
ate. 


i: 

1 

2 

3 

4 

5 

6 

7 

8 

X- 

59 

49 

75 

54 

78 

56 

60 

82 

Y,: 

209 

180 

195 

192 

215 

197 

208 

189 

i: 

9 

10 

11 

12 

13 

14 

15 

16 

Xt: 

69 

83 

88 

94 

47 

65 

89 

70 

Yf 

213 

201 

214 

212 

205 

186 

200 

204 

a.  Obtain  the  estimated  regression  function.  Plot  the  estimated  regression  func¬ 
tion  and  the  data.  Does  the  linear  regression  function  appear  to  give  a  good  fit 
here?  Discuss. 

b.  Obtain  point  estimates  of  the  following:  (1)  the  difference  in  the  mean  rob¬ 
bery  rate  in  cities  that  differ  by  one  unit  in  population  density,  (2)  the  mean 
robbery  rate  last  year  in  cities  with  population  density  X  =  60,  (3)  ew,  (4) 


Exercises  /  57 


EXERCISES 

2.25.  Refer  to  regression  model  (2.1).  Assume  that  X  =  0  is  within  the  scope  of  the 
model.  What  is  the  implication  for  the  regression  function  if  yS0  =  0  so  that  the 
model  is  Yt  =  fi\X,  +  g;?  How  would  the  regression  function  plot  on  a  graph? 

2.26.  Refer  to  regression  model  (2.1).  What  is  the  implication  for  the  regression  func¬ 
tion  if  /3i  =  0  so  that  the  model  is  Yt  =  /30  +  £;?  How  would  the  regression  func¬ 
tion  plot  on  a  graph? 

2.27.  Refer  to  Plastic  hardness  Problem  2. 18.  Suppose  one  test  item  was  molded  from 
a  single  batch  of  plastic  and  the  hardness  of  this  one  item  was  measured  at  12 
different  points  in  time.  Would  the  error  term  in  the  model  for  this  case  still  reflect 
the  same  effects  as  for  the  experiment  initially  described?  Would  you  expect  the 
error  terms  for  the  different  points  in  time  to  be  uncorrelated?  Discuss. 

2.28.  Derive  the  expression  for  b\  in  (2.10a)  from  the  normal  equations  in  (2.9). 

2.29.  (Calculus  needed.)  Refer  to  the  model  F)  =  /3q  +  st  in  Exercise  2.26.  Using  the 
method  of  least  squares,  derive  the  estimator  of  /30  for  this  model. 

2.30.  Prove  that  the  least  squares  estimator  of  / 30  obtained  in  Exercise  2.29  is  unbiased. 

2.31.  Prove  the  result  in  (2.20) — that  the  sum  of  the  residuals  weighted  by  the  fitted 
values  is  zero. 

2.32.  Refer  to  Table  2. 1  for  the  Westwood  Company  example.  When  asked  to  present  a 
point  estimate  of  the  expected  man-hours  for  runs  of  60  pieces,  a  person  gave  the 
estimate  131.7  because  this  is  the  mean  number  of  man-hours  in  the  three  runs  of 
size  60  in  the  study.  A  critic  states  that  this  person’s  approach  “throws  away” 
most  of  the  data  in  the  study  because  observations  on  lot  sizes  other  than  60  are 
ignored.  Comment. 

2.33.  In  Airfreight  breakage  Problem  2. 17,  the  least  squares  estimates  are  b0  =  10.20 
and  b\  =  4.00,  and  Hef  =  17.60.  Evaluate  the  least  squares  criterion  Q  in  (2.8) 
for  the  estimates:  (1)  b0  =  9,  bx  =  3;  (2 )b0=  ll,b\  =  5.  Is  the  criterion  Q  larger 
for  these  estimates  than  for  the  least  squares  estimates? 

2.34.  Two  observations  on  Y  were  obtained  at  each  of  three  X  levels — namely,  X  —  5, 
X  =  10,  and  X  =  15. 

a.  Show  that  the  least  squares  regression  line  fitted  to  the  three  points  (5,  Fj), 
(10,  F2),  and  (15,  F3),  where  F1;  Y2,  and  Y3  denote  the  means  of  the  Y  obser¬ 
vations  at  the  three  X  levels,  is  identical  to  the  least  squares  regression  line 
fitted  to  the  original  six  observations. 

b.  In  this  study,  could  the  error  term  variance  a1  be  estimated  without  fitting  a 
regression  line?  Explain. 

2.35.  In  the  Westwood  Company  example,  the  observations  at  X  =  20  and  X  =  80  fall 
directly  on  the  fitted  regression  line  (Table  2.3  and  Figure  2.2b).  If  these  two 
observations  were  deleted,  would  the  least  squares  regression  line  fitted  to  the 
remaining  eight  observations  be  changed?  [Hint:  What  is  the  contribution  of  the 
two  observations  to  the  least  squares  criterion  Q  in  (2.8)?] 

2.36.  (Calculus  needed.)  Refer  to  the  model  Yt  =  (i\Xl  +£;,/'  =1 , ,  n,  in  Exercise 
2.25. 


58  /  Linear  regression  with  one  independent  variable 


a.  Find  the  least  squares  estimator  of  fa. 

b.  Assume  that  the  error  terms  et  are  independent  N{ 0,  a2)  and  that  a2  is 
known.  State  the  likelihood  function  for  the  n  sample  observations  and  obtain 
the  maximum  likelihood  estimator  of  fa.  Is  it  the  same  as  the  least  squares 
estimator? 

c.  Show  that  the  maximum  likelihood  estimator  is  unbiased. 

2.37.  Shown  below  are  the  number  of  galleys  of  type  set  (X)  and  the  dollar  cost  of 
correcting  typographical  errors  (Y)  in  a  random  sample  of  recent  orders  handled 
by  a  firm  specializing  in  technical  printing.  Assume  that  the  regression  model 
Yt  =  fa Xi  +  £[  is  appropriate,  with  normally  distributed  independent  error  terms 
whose  variance  is  o2  =  16. 

i:  1  2  3  4  5  6 

X,:  7  12  4  14  25  30 

Y-  128  213  75  250  446  540 

a.  State  the  likelihood  function  for  the  six  observations,  for  a2  —  16. 

b.  Evaluate  the  likelihood  function  for  fa  =  17,  18,  and  19.  For  which  of  these 
fa  values  is  the  likelihood  function  largest? 

c.  The  maximum  likelihood  estimator  is  F,  =  'S.XiYil'LX2 .  Find  the  maximum 
likelihood  estimate.  Are  your  results  in  part  (b)  consistent  with  this  estimate? 


PROJECTS 

2.38.  Refer  to  the  SMSA  data  set  (pp.  537-41).  The  number  of  active  physicians  in  an 
SMSA  (T)  is  expected  to  be  related  to  total  population,  land  area,  and  total 
personal  income.  Assume  that  the  first-order  regression  model  (2. 1)  is  appropriate 
in  each  case. 

a.  Regress  the  number  of  active  physicians  in  turn  on  each  of  the  three  inde¬ 
pendent  variables.  State  the  estimated  regression  functions. 

b.  Plot  the  three  estimated  regression  functions  and  data  on  separate  graphs. 
Does  a  linear  regression  relation  appear  to  provide  a  good  fit  in  each  of  the 
three  cases? 

c.  Calculate  MSE  for  each  of  the  three  cases.  Which  independent  variable  leads 
to  the  smallest  variability  around  the  fitted  regression  line? 

2.39.  Refer  to  the  SMSA  data  set. 

a.  For  each  geographic  region,  regress  total  serious  crimes  in  an  SMSA  ( Y ) 
against  total  population  (X).  Assume  that  the  first-order  regression  model 
(2. 1)  is  appropriate  for  each  region.  State  the  estimated  regression  functions. 

b.  Are  the  estimated  regression  functions  similar  for  the  four  regions?  Discuss. 

c.  Calculate  MSE  for  each  region.  Is  the  variability  around  the  fitted  regression 
line  approximately  the  same  for  the  four  regions?  Discuss. 

2.40.  Refer  to  the  SENIC  data  set  (pp.  533-36).  The  average  length  of  stay  in  a  hospital 
(Y)  is  anticipated  to  be  related  to  infection  risk,  available  facilities  and  services, 
and  routine  chest  X-ray  ratio.  Assume  that  the  first-order  regression  model  (2. 1)  is 
appropriate  in  each  case. 


Projects  /  59 


a.  Regress  average  length  of  stay  on  each  of  the  three  independent  variables. 
State  the  estimated  regression  functions. 

b.  Plot  the  three  estimated  regression  functions  and  data  on  separate  graphs. 
Does  a  linear  relation  appear  to  provide  a  good  fit  in  each  of  the  three  cases? 

c.  Calculate  MSE  for  each  of  the  three  cases.  Which  independent  variable  leads 
to  the  smallest  variability  around  the  fitted  regression  line? 

2.41.  Refer  to  the  SENIC  data  set. 

a.  For  each  geographic  region,  regress  average  length  of  stay  in  hospital  (7) 
against  infection  risk  (X).  Assume  that  the  first-order  regression  model  (2.1) 
is  appropriate  for  each  region.  State  the  estimated  regression  functions. 

b.  Are  the  estimated  regression  functions  similar  for  the  four  regions?  Discuss. 

c.  Calculate  MSE  for  each  region.  Is  the  variability  around  the  fitted  regression 
line  approximately  the  same  for  the  four  regions?  Discuss. 


CITED  REFERENCE 


2.1  Nie,  N.  H.;  C.  H.  Hull;  J.  G.  Jenkins;  K.  Steinbrenner;  and  D.  H.  Bent.  SPSS: 
Statistical  Package  for  the  Social  Sciences.  2d  ed.  New  York:  McGraw-Hill,  1975. 


Inferences  in  regression  analysis 


In  this  chapter,  we  first  take  up  inferences  concerning  the  regression  parame¬ 
ters  ft  and  ft ,  considering  both  interval  estimation  of  these  parameters  and  tests 
about  them.  We  then  discuss  interval  estimation  of  the  mean  E(Y )  of  the  proba¬ 
bility  distribution  of  Y,  for  given  X,  and  prediction  intervals  for  a  new  observa¬ 
tion  F,  given  X.  Finally,  we  take  up  the  analysis  of  variance  approach  to  regres¬ 
sion  analysis. 

Throughout  this  chapter,  and  in  the  remainder  of  Part  I  unless  otherwise 
stated,  we  assume  that  the  normal  error  model  (2.25)  is  applicable.  This  model 
is: 

(3.1)  Yt  =  (30  +  fiiXi  +  et 

where: 

ft  and  ft  are  parameters 
Xt  are  known  constants 
Si  are  independent  N(0,  a2) 

3.1  INFERENCES  CONCERNING  ft 

Frequently,  we  are  interested  in  drawing  inferences  about  ft,  the  slope  of  the 
regression  line  in  model  (3.1).  For  instance,  a  market  research  analyst  studying 
the  relation  between  sales  (F)  and  advertising  expenditures  (X)  may  wish  to 


60 


3.1  Inferences  concerning  ft  /  61 


obtain  an  interval  estimate  of  ft  because  it  will  provide  information  as  to  how 
many  additional  sales  dollars,  on  the  average,  are  generated  by  an  additional 
dollar  of  advertising  expenditure. 

At  times,  tests  concerning  ft  are  of  interest,  particularly  one  of  the  form: 


H0:  ft  =  0 
Ha.  ft  ^  0 

The  reason  for  interest  in  testing  whether  or  not  ft  =  0  is  that  ft  =  0  indicates 
that  there  is  no  linear  association  between  Y  and  X.  Figure  3 . 1  illustrates  the  case 
when  ft  =  0  for  the  normal  error  model  (3.1).  Note  that  the  regression  line  is 
horizontal  and  that  the  means  of  the  probability  distributions  of  Y  are  therefore  all 
equal,  namely: 


E(Y)  =  ft  +  (0)X  =  /So 

Since  model  (3.1)  assumes  normality  of  the  probability  distributions  of  Y  with 
constant  variance,  and  since  the  means  are  equal  when  ft  =  0,  it  follows  that  the 
probability  distributions  of  Y  are  identical  when  ft  =  0.  This  is  shown  in  Figure 
3.1.  Thus,  ft  =  0  for  model  (3.1)  implies  not  only  that  there  is  no  linear  associa¬ 
tion  between  Y  and  X  but  also  that  there  is  no  relation  of  any  type  between  Y  and 
X,  since  the  probability  distributions  of  Y  are  then  identical  at  all  levels  of  X. 

Before  discussing  inferences  concerning  ft,  further,  we  need  to  consider  the 
sampling  distribution  of  ft,  our  point  estimator  of  /3i . 

FIGURE  3.1  Model  (3.1)  when  ft  =  0 


Sampling  distribution  of  ft 


The  point  estimator  ft  was  given  in  (2.10a)  as  follows: 


2(X  -  TO  -  Y) 


(3.2) 


2(X,  -  X)2 


62  /  Inferences  in  regression  analysis 


The  sampling  distribution  of  by  refers  to  the  different  values  of  bx  that  would  be 
obtained  with  repeated  sampling  when  the  levels  of  the  independent  variable  X 
are  held  constant  from  sample  to  sample. 

(3.3)  For  model  (3.1),  the  sampling  distribution  of  bx  is  normal,  with 
mean  and  variance: 


(3.3a) 


E(b  i)  =  ft 


(3.3b) 


v2(h) 


2(X,  ~  X? 


To  show  this,  we  need  to  recognize  that  bx  is  a  linear  combination  of  the 
observations  Yt. 


bi  as  linear  combination  of  the  Yt.  We  will  show  that  bx,  as  defined  in 
(3.2),  can  be  written: 

bi  =  2^-7; 

Here  the  kt  are  constants;  hence,  bx  is  a  linear  combination  of  the  Yt.  We  first 
prove: 

(3.4)  2(Xf  -  X)(Xi  -  F)  =  2(X;-  -  X)Yj 

This  follows  since: 

2(X;  -  X)(Yt  -  F)  =  2(X,  -  X)Yt  ~  S(X,  -  X)Y 

But  2(Xf  -  X)Y  =  YX(Xi  -  X)  =  0  since  2(Xy  -  X)  =  0.  Hence,  (3.4)  holds. 
We  now  express  bx,  using  (3.4): 

b  _  X&t-XM-Y)  _  2(^-Z)7;- 
1  X(X,  -  X)2  S(X,  -  X)2 

We  can  rewrite  this  as  follows: 


(3.5) 

where: 

(3.5a) 


by  =  XkjYj 

Xj-X 
"  2(X,  -  Xf 


Observe  that  the  kt  are  fixed  quantities  since  the  Xt  are  fixed. 

Note 


The  quantities  kt  have  a  number  of  interesting  properties  that  will  be  used  later: 


1. 

(3.6) 


Iki  =  0 


3.1  Inferences  concerning  /  63 


because  2^-  =  2(X,  -  X)/2(X,-  -  X)2  =  0/2(X,-  -  X)2  =  0. 
2. 

(3.7)  ^kiXi  =  1 


because  2fcX;  -  2(Xf  -  X)Xj/2(X,  -  X)2  =  2(X,-  -_X)(X;-  -  X)/2(X,-  -  X)2 
=  2(X,  -  X)2/S(X;  -  X)2  =  1.  The  result  2(X*  -  X)(Xf  -  X)  =  2(X,  -  X)X(- 
is  obtained  in  the  same  way  as  (3.4). 


3. 

(3.8) 


2*?  = 


1 

2(Xf  -  X)2 


because: 


2*?  =  2 


(X/~X) 
2(Xf  -  X)2 


1 

2(Xi  -  X)2 


1 

[2(Xf  -  X)2]2 


2(X{  -  X)2 


Normality.  We  return  now  to  the  sampling  distribution  of  b\  for  the  normal 
error  model  (3.1).  The  normality  of  the  sampling  distribution  of  b\  follows  at 
once  from  the  fact  that  b{  is  a  linear  combination  of  the  T,  .  The  Yt  are  independ¬ 
ently,  normally  distributed  according  to  model  (3.1),  and  theorem  (1.35)  states 
that  a  linear  combination  of  independent  normal  random  variables  is  normally 
distributed. 


Mean.  The  unbiasedness  of  the  point  estimator  bx,  stated  earlier  in  the 
Gauss-Markov  theorem  (2.11),  is  easy  to  show: 

E{b{)  =  E(Lk¥t)  =  2**E(r*)  =  2*,<A>  +  fiiXd 

=  Po'Zki  +  fiuLkjXj 

Hence,  by  (3.6)  and  (3.7)  E(b{)  =  /3j . 


Variance.  The  variance  of  b\  can  be  derived  readily.  We  need  only  remem¬ 
ber  that  the  Yt  are  independent  random  variables,  each  with  variance  cr2,  and  that 
the  ki  are  constants.  Hence,  we  obtain  by  (1.26): 

o"2(^i)  =  or2(2.kiYi)  =  2kfa-z(Yi) 

=  Y,kf(r2  —  a-2Hkf 


2(Xf  -  X)2 

The  last  step  follows  from  (3.8). 


Estimated  variance.  We  can  estimate  the  variance  of  the  sampling  distribu¬ 
tion  of  b\. 


v2Q>i) 


2(X,  -  X)2 


64  /  Inferences  in  regression  analysis 


by  replacing  the  parameter  cr2  with  the  unbiased  estimator  of  a2,  namely,  MSE: 


(3.9) 


s2(bi) 


MSE 


MSE 


2(W  -  X): 


(W 


n 

The  point  estimator  s2(A)  is  an  unbiased  estimator  of  cr2(A).  Taking  the  square 
root,  we  obtain  s(A),  our  point  estimator  of  o-(A)- 

Note 


We  stated  in  theorem  (2.11)  that  b]  has  minimum  variance  among  all  unbiased  linear 
estimators  of  the  form: 


A  =  IciYi 

where  the  cf  are  arbitrary  constants.  We  shall  now  prove  this.  Since  A  must  be  unbiased, 
the  following  must  hold: 

AA)  =  E&cft)  =  2c;AT)  =  A 

Now  E(Yi)  =  /30  +  AX,  by  (2.2)  so  that  the  above  condition  becomes: 

E(  A)  =  2q(  A  +  /3iW)  =  jSo2cf  +  /3,2cW  =  A 

For  the  unbiasedness  condition  to  hold,  the  c,  must  follow  the  restrictions: 

2cf  =  0  Sc*  =  1 
Now  the  variance  of  A  is  by  (1.26): 

or2(  A)  =  Sc-o-2(T)  =  o-2Ec? 

Let  us  define  c,  =  A:,-  +  dt,  where  the  kt  are  the  least  squares  constants  in  (3.5)  and  the  dt 
are  arbitrary  constants.  We  can  then  write: 

ct2(A)  =  c r2Sc2  =  cr22(L  +  di)2  —  a2[^kf  +  SJ2  + 

We  know  that  a2^kf  =  cr2(bi)  from  our  proof  above.  Further,  2AA  =  0;  this  follows 
from  the  restrictions  on  the  kt  and  the  c;.  Hence,  we  have: 

o-2(  A)  =  cr2(A)  +  cr2Xr/2 

—  A. 

Note  that  the  smallest  value  of  Xr/f  is  zero.  Hence,  the  variance  of  A  is  at  a  minimum 
when  Sr/2  =  0.  But  this  can  only  occur  if  all  dt  —  0,  which  implies  c,  =  /c,.  Thus,  the  least 
squares  estimator  A  has  minimum  variance  among  all  unbiased  linear  estimators. 


Sampling  distribution  of  (A  —  /31)/s(ft1) 

Since  A  is  normally  distributed,  we  know  that  the  standardized  statistic 
(A  —  A)/cr(/?i)  is  a  standard  normal  variable.  Ordinarily,  of  course,  we  need  to 
estimate  cr(A)  by  s(A),  and  hence  are  interested  in  the  distribution  of  the  stand¬ 
ardized  statistic  ( A  —  A)As’(Zh).  An  important  theorem  in  statistics  states: 

(3.10)  is  distributed  as  tin  —  2)  for  model  (3.1) 


3.1  Inferences  concerning  /S1  /  65 


Intuitively,  this  result  should  not  be  unexpected.  We  know  that  if  the  observa¬ 
tions  Yt  come  from  the  same  normal  population,  (7  -  fi)/s(Y)  follows  the  t  distri¬ 
bution  with  n  —  1  degrees  of  freedom.  The  estimator  bi,  like  Y,  is  a  linear 
combination  of  the  observations  Yt.  The  reason  for  the  difference  in  the  degrees 
of  freedom  is  that  two  parameters  (/30  and  /3 1)  need  to  be  estimated  for  the 
regression  model;  hence,  two  degrees  of  freedom  are  lost  here. 

Note 


We  can  show  that  ( b\  —  fi])ls(b\)  is  distributed  as  t  with  n  —  2  degrees  of  freedom  by 
relying  on  the  following  theorem: 

(3.11)  For  model  (3.1),  SSE/cr2  is  distributed  as  x2  with  n  —  2  degrees  of  freedom, 
and  is  independent  of  b0  and  b  \ , 

First,  let  us  rewrite  ( b\  —  fi\)ls(b\)  as  follows: 

fri  ~  P\  ^  s{bx) 
a{bx)  crib j) 

The  numerator  is  a  standard  normal  variable  z.  The  nature  of  the  denominator  can  be  seen 
by  first  considering: 

MSE  SSE 

s2jb\)  S(X,-  -  X )2  _  MSE  n-2 
<r2(bi)  <t2  cr2  <j2 

2(Xf  -  Z)2 
SSE  X\n  -  2) 
cr2(n  —  2)  n  —  2 

where  the  symbol  ~  stands  for  “is  distributed  as.”  The  last  step  follows  from  (3.11). 
Hence,  we  have: 

b\  ~  fii  _ _ z 

J(&i)  /  x\n  ~  2) 

V  n-2 


But  by  theorem  (3.11),  z  and  x2  are  independent,  since  z  is  a  function  of  b{  and  b\  is 
independent  of  SSE/cr2  ~  x2 ■  Hence,  by  definition  (1.39),  it  follows  that: 


bi  ~  Pi 
s(b{) 


t(n  —  2) 


This  result  places  us  in  a  position  to  readily  make  inferences  concerning  Pi. 


Confidence  interval  for  f$i 

Since  (b\  —  Pi)/s(bi)  follows  a  t  distribution,  we  can  make  the  following 
probability  statement: 

(3.12)  P{t(a/ 2;  n  -  2)  <  {bx  -  /31)/v(&1)  <  r(l  -  a/2;  n  -  2)}  =  1  -  a 
Here,  t(a/2;n  —  2)  denotes  the  (a/2)100  percentile  of  the  t  distribution  with 


66  /  Inferences  in  regression  analysis 


n  —  2  degrees  of  freedom.  Because  of  the  symmetry  of  the  t  distribution,  it 
follows  that: 

(3.13)  t{a!2\  n  —  2)  =  —  f(l  —  a/2;  n  —  2) 

Rearranging  the  inequalities  in  (3.12)  and  using  (3.13),  we  obtain: 

(3.14)  P{fa  -  t(  1  -  a/2;  n  -  2)j(fc1)  <  fa 

<  fcq  +  f(l  —  all ;  «  —  2)5(fci)}  =  1  —  a 

Since  (3.14)  holds  for  all  possible  values  of  fa,  the  1  —  a  confidence  limits  for 
are: 

(3.15)  bi  ±  t(  1  —  all ;  n  —  2)s(/?i) 


Example.  Let  us  return  to  the  Westwood  Company  lot  size  example  of 
Chapter  2.  Management  wishes  an  estimate  of  /L  with  a  95  percent  confidence 
coefficient.  We  summarize  in  Table  3. 1  the  needed  results  obtained  earlier.  First, 
we  need  to  obtain  s{b{): 


s2(b  i) 


MSE 


7.5 

-  =  .002206 

3,400 


and: 


s(£j)  =  .04697 

For  a  95  percent  confidence  coefficient,  we  require  r(.975;  8).  From  Table  A-2 
in  the  Appendix,  we  find  t(.975;  8)  =  2.306.  The  95  percent  confidence  inter¬ 
val,  by  (3.15),  then  is: 

2.0  -  2.306(.04697)  <  fa  <  2.0  +  2.306(.04697) 

1.89  <  fa  <  2.11 

Thus,  with  confidence  coefficient  .95,  we  estimate  that  the  mean  number  of 
man-hours  increases  by  somewhere  between  1.89  and  2.11  for  each  increase  of 
one  part  in  the  lot  size. 

TABLE  3.1  Results  for  Westwood  Company  example  obtained  in 
Chapter  2 


n  =  10 
b0  =  10.0 
f=  10.0  +  2.0  X 
2  Xf  =  28,400 


X=  50 
bx  =  2.0 
SSE  =  60 
MSE  =7.5 


Ih2 


(I  T,.)2 

n 


X)2  =  3,400 


XXtYt-  ^Xi^Yi  =  X(Xt-  X)(Yt  -  Y)  =  6,800 


n 

(I  Td2 


=  I(Ti-  T)2  =  13,660 


L(L')2 


« 


3.1  Inferences  concerning  px  /  67 


Note 

In  Chapter  2,  we  noted  that  the  scope  of  a  regression  model  is  restricted  ordinarily  to 
some  interval  of  values  of  the  independent  variable.  This  is  particularly  important  to  keep 
in  mind  when  using  estimates  of  the  slope  j3\ .  In  our  lot  size  example,  a  linear  regression 
model  appeared  appropriate  for  lot  sizes  between  20  and  80,  the  range  of  the  independent 
variable  in  the  recent  past.  It  may  not  be  reasonable  to  use  the  estimate  of  the  slope  to  infer 
the  effect  of  lot  size  on  number  of  man-hours  far  outside  this  range  since  the  regression 
relation  may  not  be  linear  there. 


Tests  concerning  fix 

Since  (bx  —  (3\)/s(bi)  is  distributed  as  t  with  n  —  2  degrees  of  freedom,  tests 
concerning  can  be  set  up  in  ordinary  fashion  using  the  t  distribution. 


Example  1:  Two-sided  test.  Suppose  a  cost  analyst  in  the  Westwood  Com¬ 
pany  is  interested  in  testing  whether  or  not  there  is  a  linear  association  between 
man-hours  and  lot  size,  using  regression  model  (3.1).  The  two  alternatives  then 
are: 


(3.16) 


Ho-  ft  =  0 

Ha:  (3^0 


If  the  analyst  wishes  to  control  the  risk  of  a  Type  I  error  at  .05,  he  could  indeed 
conclude  Ha  at  once  by  referring  to  the  95  percent  confidence  interval  for  j3x 
constructed  earlier,  since  the  interval  does  not  include  0. 

An  explicit  test  of  the  alternatives  (3.16)  is  based  on  the  test  statistic: 


(3.17) 


r* 


b\ 

s(bv) 


The  decision  rule  with  this  test  statistic  when  controlling  the  level  of  significance 
at  a  is: 


If  |t*  |  <  t{\  —  a/2;  n  —  2),  conclude  H0 
If  1 1*  |  >  t{\  —  a/2;  n  —  2),  conclude  Ha 

For  the  Westwood  Company  example,  where  a  =  .05,  bx  —  2.0,  and  s(b{) 
=  .04697,  we  require  t(.915;  8)  =  2.306.  Thus,  the  decision  rule  for  testing 
alternatives  (3.16)  is: 

If  |r*|  <  2.306,  conclude  H0 
If  |?*|  >  2.306,  conclude  Ha 

Since  |?*|  =  j  2.0/. 04697 1  =  42.58  >  2.306,  we  conclude  Ha,  that  ^  0  or 
that  there  is  a  linear  association  between  man-hours  and  lot  size. 

The  P-value  for  the  sample  outcome  is  obtained  by  finding  the  probability 
P[t( 8)  >  t*  =  42.58].  We  see  from  Table  A-2  that  this  probability  is  less  than 
.0005.  Indeed,  it  can  be  shown  to  be  almost  0,  to  be  denoted  by  0+.  Thus,  the 


68  /  Inferences  in  regression  analysis 


two-sided  P-value  is  2(0+)  =  0+.  Since  the  two-sided  P-value  is  less  than  the 
specified  level  of  significance  a  =  .05,  we  could  conclude  Ha  directly. 

Example  2:  One-sided  test.  If  the  analyst  had  wished  to  test  whether  or  not 
(3i  is  positive,  controlling  the  level  of  significance  at  a  =  .05,  the  alternatives 
would  have  been: 


H0:  <  0 

Ha :  ft  >  0 


and  the  decision  rule  based  on  test  statistic  (3.17)  would  have  been: 

If  £*  <  t(l  —  a;  n  —  2),  conclude  H0 
If  t*  >  £(1  —  a;  n  —  2),  conclude  Ha 

For  a  =  .05,  we  require  £(.95;  8)  =  1.860.  Since  £*  =  42.58  >  1.860,  we 
would  conclude  Ha,  that  ft  is  positive. 

This  same  conclusion  could  be  reached  directly  from  the  one-sided  P-value, 
which  was  noted  in  Example  1  to  be  0+ .  Since  this  P- value  is  less  than  .05,  we 
would  conclude  Ha. 

Comments 


1 .  Many  computer  packages  and  scientific  publications  commonly  report  the  P-value 
together  with  the  value  of  the  test  statistic.  In  this  way,  one  can  conduct  a  test  at  any 
desired  level  of  significance  a  by  comparing  the  P-value  with  the  specified  level  a.  Users 
of  computer  packages  need  to  be  careful  to  ascertain  whether  one-sided  or  two-sided 
P-values  are  furnished. 

2.  Occasionally,  it  is  desired  to  test  whether  or  not  ft 1  equals  some  specified  nonzero 
value  jSio,  which  may  be  a  historical  norm,  the  value  for  a  comparable  process,  or  an 
engineering  specification.  For  such  a  test,  the  appropriate  test  statistic  is: 


(3.18) 


t*=bl~ 

s(bx) 


The  decision  rule  to  be  used  for  the  alternatives: 


Ho'-  Pi  —  (3\o 

Ha-  fil  +  fiio 

is  still  (3.17a),  but  it  now  is  based  on  t*  defined  in  (3.18). 

Note  that  test  statistic  (3.18)  simplifies  to  test  statistic  (3.17)  when  the  test  involves 
Hq.  Pi  =  fiio  =  0. 


3.2  INFERENCES  CONCERNING  j30 

As  noted  in  Chapter  2,  there  are  only  infrequent  occasions  when  we  wish  to 
make  inferences  concerning  /80,  the  intercept  of  the  regression  line.  These  occur 
when  the  scope  of  the  model  includes  X  =  0. 


3.2  Inferences  concerning  /30  /  69 


Sampling  distribution  of  bQ 

The  point  estimator  b0  was  given  in  (2.10b)  as  follows: 
(3.19)  b0  =  Y-b1X 


The  sampling  distribution  of  b0  refers  to  the  different  values  of  b0  that  would  be 
obtained  with  repeated  sampling  when  the  levels  of  the  independent  variable  X 
are  held  constant  from  sample  to  sample. 


(3.20)  For  model  (3.1),  the  sampling  distribution  of  b0  is  normal,  with 
mean  and  variance: 


(3.20a)  E(b0)  =  Po 

_L  X2 

n  2(Xf  -  Xf 

The  normality  of  the  sampling  distribution  of  b0  follows  because  b0,  like  b  { ,  is 
a  linear  combination  of  the  observations  7;  .  The  results  for  the  mean  and  variance 
of  the  sampling  distribution  of  b0  can  be  obtained  in  similar  fashion  as  those  for 

h. 

An  estimator  of  <J2(b0)  is  obtained  by  replacing  <j 2  by  its  point  estimator 
MSE: 


2X2 

(3.20b)  a2{b0)  = 


(3.21)  s2(b0)  =  MSE 


IX? 


nX{XL  -  xy 


MSE 


+ 


X" 


n 


%  -  Xf 


The  square  root,  s(bo),  is  an  estimator  of  cr(b0). 


Sampling  distribution  of  (bQ  —  /80)/s(2>o) 

Analogous  to  theorem  (3.10)  for  blf  there  is  a  theorem  for  b0  that  states: 
bo  ~  Po 

(3.22)  - is  distributed  as  t(n  —  2)  for  model  (3.1) 

s(b0) 

Hence,  confidence  intervals  for  p0  and  tests  concerning  p0  can  be  set  up  in 
ordinary  fashion,  using  the  t  distribution. 


Confidence  interval  for  /30 

The  1  —  a  confidence  limits  for  p0  are  obtained  in  the  same  manner  as  those 
for  Pi  derived  earlier.  They  are: 

(3.23)  bQ  ±  t{\  —  a/2;  n  —  2 )s(b0) 

Example.  As  noted  earlier,  the  scope  of  the  model  for  the  Westwood  Com¬ 
pany  example  does  not  extend  to  lot  sizes  of  X  =  0.  Hence,  the  regression  pa- 


70 


/ 


Inferences  in  regression  analysis 


rameter  (30  may  not  have  intrinsic  meaning.  If,  nevertheless,  a  90  percent  confi¬ 
dence  interval  for  (30  were  desired,  we  would  proceed  by  finding  t(. 95;  8)  and 
s(b0).  From  Table  A-2,  we  find  £(.95;  8)  =  1.860.  Using  the  earlier  results  sum¬ 
marized  in  Table  3.1,  we  obtain  by  (3.21): 


s2(b0)  =  MSE 


IX} 

»2(Xf  -  Xf 


(7.5) 


28,400 

10(3,400) 


6.26471 


or: 


s(b0)  =  2.50294 

Hence,  the  90  percent  confidence  interval  for  (30  is: 

10.0  -  1.860(2.50294)  <  (30  <  10.0  +  1.860(2.50294) 

5.34  <  fa  <  14.66 

We  caution  again  that  this  confidence  interval  does  not  necessarily  provide 
meaningful  information.  For  instance,  it  does  not  necessarily  provide  informa¬ 
tion  about  the  “setup”  costs  of  producing  a  lot  of  parts  (costs  incurred  in  setting 
up  the  production  process,  no  matter  what  is  the  lot  size),  since  we  are  not  certain 
whether  a  linear  regression  model  is  appropriate  when  the  scope  of  the  model  is 
extended  to  X  =  0. 


3.3  SOME  CONSIDERATIONS  ON  MAKING  INFERENCES 
CONCERNING  /!„  AND 

Effect  of  departures  from  normality 

If  the  probability  distributions  of  Y  are  not  exactly  normal  but  do  not  depart 
seriously,  the  sampling  distributions  of  b0  and  b\  will  be  approximately  normal, 
and  the  use  of  the  t  distribution  will  provide  approximately  the  specified  confi¬ 
dence  coefficient  or  level  of  significance.  Even  if  the  distributions  of  Y  are  far 
from  normal,  the  estimators  b0  and  bx  generally  have  the  property  of  asymptotic 
normality — their  distributions  approach  normality  under  very  general  conditions 
as  the  sample  size  increases.  Thus,  with  sufficiently  large  samples,  the  confi¬ 
dence  intervals  and  decision  rules  given  earlier  still  apply  even  if  the  probability 
distributions  of  Y  depart  far  from  normality.  For  large  samples,  the  t  value  is,  of 
course,  replaced  by  the  z  value  for  the  standard  normal  distribution. 


Interpretation  of  confidence  coefficient  and  risks  of  errors 

Since  model  (3.1)  assumes  that  the  Xt  are  known  constants,  the  confidence 
coefficient  and  risks  of  errors  are  interpreted  with  respect  to  taking  repeated 
samples  in  which  the  A  observations  are  kept  at  the  same  levels  as  in  the  observed 
sample.  For  instance,  we  constructed  a  confidence  interval  for  (3i  with  a  confi¬ 
dence  coefficient  of  .95  in  the  Westwood  Company  example.  This  coefficient  is 


3.3  Some  considerations  on  making  inferences  concerning  /30  and  /3-i  /  71 


interpreted  to  mean  that  if  many  independent  samples  are  taken  where  the  levels 
of  X  (the  lot  sizes)  in  the  first  sample  are  repeated  in  these  other  samples  and  a 
95  percent  confidence  interval  is  constructed  for  each  sample,  95  percent  of  the 
intervals  will  contain  the  true  value  of  fir- 


Spacing  of  the  X  levels 

Inspection  of  formulas  (3.3b)  and  (3.20b)  for  the  variances  of  br  and  b0, 
respectively,  indicates  that  for  given  n  and  a2,  these  variances  are  affected  by  the 
spacing  of  the  X  levels  in  the  observed  data.  For  example,  the  more  the  spread  in 
the X  levels,  the  larger  is  the  quantity  S(X;  —  X)2  and  the  smaller  is  the  variance 
of  b]  .  We  will  discuss  in  Section  5.9  how  the X  observations  should  be  spaced  in 
experiments  where  spacing  can  be  controlled. 


Power  of  tests 


The  power  of  tests  on  fi0  and  fit  can  be  obtained  from  Table  A-5  in  the 
Appendix,  which  contains  charts  of  the  power  function  of  the  t  test.  Consider,  for 
example,  the  general  decision  problem: 


(3.24) 


Ho.  fi i  —  fiio 
Ha-  fir  *  firo 


for  which  the  general  test  statistic  (3.18)  is  employed: 


(3.24a) 


_  bi  firo 
sib  i) 


and  the  decision  rule  for  level  of  significance  a  is: 


(3.24b) 


If  |  r*  |  <  r(l  —  a/2;  n  —  2),  conclude  H0 
If  1 1*  |  >  t(l  —  a/2;  n  —  2),  conclude  Ha 


The  power  of  the  test  is  the  probability  that  the  decision  rule  will  lead  to 
conclusion  Ha  when  Ha  in  fact  holds.  Specifically,  the  power  is  given  by: 

(3.25)  Power  =  P{|  t*  \  >  r(l  -  a/2;  n  -  2)  \  8} 

where  8  is  a  measure  of  noncentrality — i.e. ,  how  far  the  true  value  of  fix  is  from 
fiio- 


(3.26) 


fix  fiio 
a(br) 


Table  A-5  presents  the  power  of  the  two-sided  t  test  (in  percent)  for  a  =  .01 
and  a  =  .05,  for  various  degrees  of  freedom  df.  To  illustrate  the  use  of  this  table, 
let  us  return  to  the  Westwood  Company  example  where  we  tested: 

H0:  fir  =  fiw  =  0 
Ha.  fil  ^fil0  =  0 


72  /  Inferences  in  regression  analysis 


Suppose  we  wish  to  know  the  power  of  the  test  when  fix  =  .25.  To  ascertain  this, 
we  need  to  know  cr2,  the  variance  of  the  error  terms.  Assume  that  a 2  =  10.0  so 
that  cr2(b\ )  for  our  example  would  be: 


Or\bi) 


a2  _  10.0 

2(Xf  -  X )2  ~  3,400 


.002941 


or  a(bx)  =  .05423.  Then  8  =  |  .25  —  0|  .05423  =  4.6.  We  enter  the  graph  for 

a  =  .05  (the  level  of  significance  used  in  the  test)  and  approximate  visually  the 
curve  for  eight  degrees  of  freedom.  Reading  the  ordinate  at  8  =  4.6,  we  obtain 
97  percent  approximately.  Thus,  if  (3X  =  .25,  the  probability  would  be  about  .97 
that  we  would  be  led  to  conclude  Ha  ((3X  ¥=  0).  In  other  words,  if  (3X  =  .25,  we 
would  be  almost  certain  to  conclude  that  there  is  a  relation  between  man-hours 
and  lot  size. 

The  power  of  tests  concerning  (30  can  be  obtained  from  Table  A-5  in  com¬ 
pletely  analogous  fashion.  For  one-sided  tests,  Table  A-5  should  be  entered  so 
that  one  half  the  level  of  significance  shown  there  is  the  level  of  significance  of 
the  one-sided  test. 


3.4  INTERVAL  ESTIMATION  OF  E(Yh) 

In  regression  analysis,  one  of  the  major  goals  usually  is  to  estimate  the  mean 
for  one  or  more  probability  distributions  of  Y.  Consider,  for  example,  a  study  of 
the  relation  between  level  of  piecework  pay  (X)  and  worker  productivity  (7).  The 
mean  productivity  at  high  and  medium  levels  of  piecework  pay  may  be  of  partic¬ 
ular  interest  for  purposes  of  analyzing  the  benefits  obtained  from  an  increase  in 
the  pay.  As  another  example,  the  Westwood  Company  may  be  interested  in  the 
mean  response  (mean  number  of  man-hours)  for  lot  sizes  of  X  =  40  parts,  X  = 
55  parts,  and  X  =  70  parts  for  purposes  of  choosing  appropriate  lot  sizes  for 
production. 

Let  Xh  denote  the  level  of  X  for  which  we  wish  to  estimate  the  mean  response. 
Xh  may  be  a  value  which  occurred  in  the  sample,  or  it  may  be  some  other  value  of 
the  independent  variable  within  the  scope  of  the  model.  The  mean  response  when 
X  =  Xh  is  denoted  by  E(Yh).  Formula  (2.12)  gives  us  the  point  estimator  Yh  of 
E(Yh ): 

(3.27)  Yh  =  b0  +  bxXh 

We  consider  now  the  sampling  distribution  of  Yh. 

Sampling  distribution  of  Yh 

The  sampling  distribution  of  Yh,  like  the  earlier  sampling  distributions  dis¬ 
cussed,  refers  to  the  different  values  of  Yh  which  would  be  obtained  if  repeated 
samples  were  selected,  each  holding  the  levels  of  the  independent  variable  X 
constant,  and  calculating  Yh  for  each  sample. 


3.4  Interval  estimation  of  E(Yh) 


/  73 


(3.28)  For  model  (3.1),  the  sampling  distribution  of  Yh  is  normal,  with 
mean  and  variance: 

(3.28a)  E(Yh)  =  E{Yh) 

1  |  (Xh  -  Xf  " 
n  2(Xi  -  Xf 

Normality.  The  normality  of  the  sampling  distribution  of  Yh  follows  directly 
from  the  fact  that  Yh  is  a  linear  combination  of  the  observations  Yt. 

•A 

Mean.  To  prove  that  Yh  is  an  unbiased  estimator  of  E(Yh),  we  proceed  as 
follows: 

E(Yh)  =  E(b0  +  bxXh)  =  E(b0)  +  XhE{bx) 

=  fio  +  Pi  Xh 

by  (3.3a)  and  (3.20a). 

Variance.  First,  we  show  that  bx  and  Y  are  uncorrelated  and  hence,  for 
model  (3.1),  independent: 

(3.29)  a(Y,  bx)  =  0 

where  a(Y,  b{)  denotes  the  covariance  between  Y  and  bx.  We  begin  with  the 
definitions: 


(3.28b)  a\Yh)  =  cr2 


F  =  s(7f 

by  =  Ikgi 

where  kt  is  as  defined  in  (3.5a).  We  now  use  theorem  (1.27),  with  a t  =  1  hi  and 
ct  =  hi,  remember  that  the  Yt  are  independent  random  variables: 

<T(X,  *i)  =  2  (—)*:,■  <r\Yt)  =  —2k, 

\n  n 


But  we  know  from  (3.6)  that  Sk,  =  0.  Hence,  the  covariance  is  0. 

Now  we  are  ready  to  find  the  variance  of  Yh.  We  shall  use  the  estimator  in  the 
alternative  form  (2.15): 

aHfh)  =  o-2(F  +  b,[Xh  -  f]) 

Since  Y  and  by  are  independent  and  Xh  and  X  are  constants,  we  obtain: 

<r2(fh)  =  <x2(Y)  +  (Xh  -  X)2a2(b,) 

Now  cr2(bi )  is  given  in  (3.3b)  and: 


74  /  Inferences  in  regression  analysis 


Hence: 


<r2A)  =-+(Xh-  Xf 


a 


a 


n 


2(x,  -  xy 


which,  upon  a  slight  rearrangement  of  terms,  yields  (3.28b). 

Note  the  effect  of  the  term  (Xh  —  X)2  on  cr2(Yh).  The  further  Xh  is  from  X,  the 
greater  is  the  quantity  ( Xh  —  X)2  and  the  larger  is  the  variance  of  Yh.  An  intuitive 
explanation  of  this  effect  is  found  in  Figure  3.2.  Shown  there  are  two  sample 
regression  lines,  based  on  two  samples  for  the  same  set  of  X  values.  The  two 
regression  lines  are  assumed  to  go  through  the  same  (A,  7)  point  to  isolate  the 
effect  of  interest,  namely,  the  effect  of  variation  in  the  estimated  slope  bi  from 
sample  to  sample.  Note  that  at  X} ,  near  A,  the  fitted  values  Tj  for  the  two  sample 
regression  lines  are  close  to  each  other.  At  X2,  which  is  far  from  A,  the  situation 
is  different.  Here,  the  fitted  values  Y2  differ  substantially.  Thus,  variation  in  the 
slope  bx  from  sample  to  sample  has  a  much  more  pronounced  effect  on  Yh  for  A 
levels  far  from  the  mean  A  than  for  A  levels  near  A.  Hence,  the  variation  in  the  Yh 
values  from  sample  to  sample  will  be  greater  when  Xh  is  far  from  the  mean  than 
when  Xh  is  near  the  mean. 


FIGURE  3.2  Effect  on  Yh  of^ variation  in  £>-,  from  sample  to  sample  in  two  samples  with 
same  means  Y  and  X 


When  MSE  is  substituted  for  a2  in  (3.28b),  we  obtain  s2{Yh),  the  estimated 
variance  of  Yh : 


s2(Yh )  =  MSE 


1  ,  (Xh~X)2 

n  2(A -  X)2 


(3.30) 


3.4  Interval  estimation  of  E{Yh)  /  75 


The  estimated  standard  deviation  of  Yh  then  is  s(Yh),  the  square  root  of  s2(Yh). 

Sampling  distribution  of  \Yh  —  E(Yh)]/s(Yh) 

Since  we  have  encountered  the  t  distribution  in  each  type  of  inference  for 
regression  model  (3.1)  considered  up  to  this  point,  it  should  not  be  surprising 
that: 

Y  _ E(Y ) 

(3.31)  — - — t — —  is  distributed  as  t(n  —  2)  for  model  (3.1) 

s(Yh) 

Hence,  all  inferences  concerning  E(Yh)  are  carried  out  in  the  usual  fashion 
with  the  t  distribution.  We  illustrate  the  construction  of  confidence  intervals, 
since  in  practice  these  are  more  frequently  used  than  tests. 

Confidence  interval  for  E(Yh ) 

A  confidence  interval  for  E(Yh)  is  constructed  in  the  standard  fashion,  making 
use  of  the  t  distribution  as  indicated  by  theorem  (3.31).  The  1  —  a  confidence 
limits  are: 

(3.32)  Yh  ±  t(l  -  all- n  -  2)s(Yh) 

Example  1.  Returning  to  the  Westwood  Company  lot  size  example,  let  us 
find  a  90  percent  confidence  interval  for  E{Yh)  when  the  lot  size  is  Xh  =  55  parts. 
Using  the  earlier  results  in  Table  3.1,  we  find  the  point  estimate  Yh : 

f55  =  10.0  +  2.0(55)  =  120 

A 

Next,  we  need  to  find  the  estimated  standard  deviation  s(Yh).  We  obtain,  using 
(3.30): 

s2(Y55)  =  7.5 

so  that: 

s{f55)  =  .89730 

For  a  90  percent  confidence  coefficient,  we  require  t(. 95;  8)  =  1.860.  Hence, 
our  confidence  interval  with  confidence  coefficient  .90  is  by  (3.32): 

120  -  1. 860(. 89730)  <£(755)  <  120  +  1.860(. 89730) 

118.3  <  E(Y55)  <  121.7 

We  conclude  with  confidence  coefficient  .90  that  the  mean  number  of  man-hours 
required  when  lots  of  55  parts  are  produced  is  somewhere  between  118.3  and 
121.7. 

Example  2.  Suppose  the  Westwood  Company  wishes  to  estimate  E(Yh) 
when  Xh  =  80  parts  with  a  90  percent  confidence  interval.  We  require: 


1  (55  -  50)2 


10 


+ 


3,400 


.80515 


76  /  Inferences  in  regression  analysis 


780  =  10.0  +  2.0(80)  =  170 


s2(F80)  =  7.5 


1 


10 
1.65387 
1.860 


+ 


(80  -  50); 
3,400 


2.73529 


J(3W 
f(.95;  8) 

Hence,  the  90  percent  confidence  interval  is: 

170  -  1.860(1.65387)  <£(780)  <  170  +  1.860(1.65387) 
166.9  <£(F8o)  ^  173.1 


Note  that  this  confidence  interval  is  somewhat  wider  than  that  for  example  1 , 
since  the  Xh  level  here  (Xh  =  80)  is  substantially  farther  from  the  mean  X  —  50 
than  the  Xh  level  for  example  1  (Xh  =  55). 


Comments 


1.  Since  the  X,  are  known  constants  in  model  (3.1),  the  interpretation  of  confidence 
intervals  and  risks  of  errors  in  inferences  on  the  mean  response  is  in  terms  of  taking 
repeated  samples  in  which  the  X  observations  are  at  the  same  levels  as  in  the  sample 
actually  taken.  We  noted  this  same  point  earlier  in  connection  with  inferences  on  (3q  and 

ft. 

2.  We  see  from  formula  (3.28b)  that  for  given  sample  results,  the  variance  of  Yh  is 
smallest  whenXh  —  X.  Thus,  in  an  experiment  to  estimate  the  mean  response  at  a  particu¬ 
lar  level  Xh  of  the  independent  variable,  the  precision  of  the  estimate  will  be  greatest  if 
(everything  else  remaining  equal)  the  observations  on  X  are  spaced  so  that  X  =  Xh. 

3.  When  the  sample  size  is  large,  the  t  value  in  the  confidence  limits  (3.32)  may  be 
replaced  by  the  standard  normal  z  value,  since  the  t  distribution  approaches  the  standard 
normal  distribution  with  increasing  sample  size. 

4.  The  usual  relationship  between  confidence  intervals  and  tests  applies  in  inferences 
concerning  the  mean  response.  Thus,  the  two-sided  confidence  limits  (3.32)  can  be  uti¬ 
lized  for  two-sided  tests  concerning  the  mean  response  at  Xh.  Alternatively,  a  regular 
decision  rule  can  be  set  up. 

5.  Confidence  limits  (3 .32)  apply  when  a  single  mean  response  is  to  be  estimated  from 
the  sample.  We  discuss  in  Chapter  5  how  to  proceed  when  a  number  of  mean  responses 
are  to  be  estimated  from  the  same  sample. 


3.5  PREDICTION  OF  NEW  OBSERVATION 

We  consider  now  the  prediction  of  a  new  observation  Y  corresponding  to  a 
given  level  X  of  the  independent  variable.  In  our  Westwood  Company  illustra¬ 
tion,  for  instance,  the  next  lot  to  be  produced  consists  of  55  parts  and  manage¬ 
ment  wishes  to  predict  the  number  of  man-hours  for  this  particular  lot.  As  an¬ 
other  example,  an  economist  has  estimated  the  regression  relation  between 
company  sales  and  number  of  persons  16  or  more  years  old,  based  on  data  for  the 
past  10  years.  Given  a  reliable  demographic  projection  of  the  number  of  persons 
16  or  more  years  old  for  next  year,  the  economist  wishes  to  predict  next  year’s 
company  sales. 


3.5  Prediction  of  new  observation  /  77 


The  new  observation  on  Y  is  viewed  as  the  result  of  a  new  trial,  independent 
of  the  trials  on  which  the  regression  analysis  is  based.  We  shall  denote  the  level 
of  X  for  the  new  trial  as  Xh  and  the  new  observation  on  Y  as  Yh(n&VJ).  Of  course,  we 
assume  that  the  underlying  regression  model  applicable  for  the  basic  sample  data 
continues  to  be  appropriate  for  the  new  observation. 

The  distinction  between  estimation  of  the  mean  response  E(Yh),  discussed  in 
the  preceding  section,  and  prediction  of  a  new  response  Y} l(new),  discussed  now, 
is  basic.  In  the  former  case,  we  estimate  the  mean  of  the  distribution  of  Y.  In  the 
present  case,  we  predict  an  individual  outcome  drawn  from  the  distribution  of  Y. 
Of  course,  the  great  majority  of  individual  outcomes  deviate  from  the  mean 
response,  and  this  must  be  allowed  for  in  the  procedure  for  predicting  Y} l(new). 


Prediction  interval  when  parameters  known 

To  illustrate  the  nature  of  a  prediction  interval  for  a  new  observation  Yh( new)  in 
as  simple  a  fashion  as  possible,  we  shall  first  assume  that  all  regression  parame¬ 
ters  are  known.  Later  we  shall  drop  this  assumption  and  make  appropriate  modi¬ 
fications. 

Suppose  that  the  Westwood  Company  plans  to  produce  a  lot  of  Xh  =  40  parts 
in  a  few  weeks  and  that  the  relevant  parameters  of  the  regression  model  are 
known  to  be: 


A)  =  9.5  A  =  2.1 
E(Y)  =  9.5  +  2.  IX 

cr2  =  10.0 

Thus,  for  Xh  =  40  parts,  we  have: 

E(Y40)  =  9.5  +  2.1(40)  =  93.5 

Figure  3.3  shows  the  probability  distribution  of  Y  for  Xh  =  40  parts.  Its  mean  is 
E(Y4o)  =  93.5,  and  its  standard  deviation  is  cr  =  V  10.0  =  3.162.  Further,  the 
distribution  is  normal  in  accord  with  model  (3.1). 

Suppose  we  were  to  predict  that  the  number  of  man-hours  for  the  next  lot  of 
Xh  =  40  parts  will  be  between: 


E(Y40)  ±3  cr 
93.5  ±  3(3.162) 

so  that  the  prediction  interval  would  be: 

84.0  ^  T40(new)  —  103.0 

Since  99.7  percent  of  the  area  in  a  normal  probability  distribution  falls  within 
three  standard  deviations  from  the  mean,  the  probability  is  .997  that  this  predic¬ 
tion  interval  will  give  a  correct  prediction  for  the  next  production  run  of  40  parts. 

The  basic  idea  of  a  prediction  interval  is  thus  to  choose  a  range  in  the  distribu¬ 
tion  of  Y  wherein  most  of  the  observations  will  fall,  and  to  declare  that  the  next 


78  /  Inferences  in  regression  analysis 


FIGURE  3.3  Prediction  of  yh(new)  when  parameters  known 


observation  will  fall  in  this  range.  The  usefulness  of  the  prediction  interval  de¬ 
pends,  as  always,  on  the  width  of  the  interval  and  the  needs  for  precision  by  the 
user. 

In  general,  when  the  regression  parameters  are  known,  the  1  —  a  prediction 
limits  for  Yh{n&w)  are: 

(3.33)  E{Yh)  ±  z(  1  -  a/2)o- 

In  centering  the  limits  around  E(Yh ),  we  obtain  the  narrowest  interval  consistent 
with  the  specified  probability  of  a  correct  prediction. 

Prediction  interval  for  F/^new)  when  parameters  unknown 

When  the  regression  parameters  are  unknown,  they  must  be  estimated.  The 
mean  of  the  distribution  of  Y  is  estimated  by  Yh,  as  usual,  and  the  variance  of  the 
distribution  of  Y  is  estimated  by  MSE.  We  cannot,  however,  simply  use  the 
prediction  limits  (3.33)  with  the  parameters  replaced  by  the  corresponding  point 
estimators.  The  reason  is  illustrated  intuitively  in  Figure  3.4.  Shown  there  are 
two  probability  distributions  of  Y,  corresponding  to  the  upper  and  lower  limits  of 
a  confidence  interval  for  E(Yh).  In  other  words,  the  distribution  of  Y  could  be 
located  as  far  left  as  the  one  shown,  as  far  right  as  the  other  one  shown,  or 


3.5  Prediction  of  new  observation  /  79 


FIGURE  3.4  Prediction  of  Yh{new)  when  parameters  unknown 


anywhere  in  between.  Since  we  do  not  know  the  mean  E(Yh)  and  only  estimate  it 
by  a  confidence  interval,  we  cannot  be  certain  of  the  location  of  the  distribution 
of  Y. 

Figure  3.4  also  shows  the  prediction  limits  for  each  of  the  two  probability 
distributions  of  Y  presented  there.  Since  we  cannot  be  certain  of  the  location  of 
the  distribution  of  Y,  prediction  limits  for  Yh^ new)  clearly  must  take  account  of  two 
elements,  as  shown  in  Figure  3.4: 

1.  Variation  in  possible  location  of  the  distribution  of  Y. 

2.  Variation  within  the  probability  distribution  of  Y. 

Prediction  limits  for  a  new  observation  Y  at  a  given  level  Xh  are  obtained  by 
means  of  the  following  theorem: 

A 

Y  —  y 

(3.34)  — y- - -  is  distributed  as  t{n  —  2)  for  model  (3.1) 

Note  that  the  standardized  statistic  (3.34)  uses  the  point  estimator  Yh  in  the  nu¬ 
merator  rather  than  the  true  mean  E(Yh )  because  the  true  mean  is  unknown  and 
cannot  be  used  in  making  a  prediction.  The  estimated  standard  deviation 
^(bi(new))  in  the  denominator  of  the  standardized  statistic  will  be  defined  shortly. 

From  theorem  (3.34),  it  follows  in  the  usual  fashion  that  the  1  —  a  prediction 
limits  for  a  new  observation  are  [for  instance,  compare  (3.34)  to  (3.10)  and  relate 
Yh  to  bi  and  Y  to  /B^: 

(3.35) 


Yh  ±  f(l  -  a/2;  n  -  2)s{Yh{nev<i)) 


80  /  Inferences  in  regression  analysis 


The  variance  of  the  numerator  of  the  standardized  statistic  (3.34)  is  readily 

obtained,  utilizing  the  independence  of  the  new  observation  Y  and  the  original 

/\ 

sample  observations  on  which  Yh  is  based.  We  shall  denote  the  variance  of  the 
numerator  by  cr2{Yh(nf,w)),  and  we  obtain: 

(3.36)  o-2(^(new))  =  a2(fh  -Y)  =  cr2(fh)  +  rr2 

Note  that  o-2(Yh(new))  has  two  components: 

/V 

1.  The  variance  of  the  sampling  distribution  of  Yh. 

2.  The  variance  of  the  distribution  of  Y. 


An  unbiased  estimator  of  cr2(TA(new))  is: 


(3.37) 


J2(i«„ew))  =  A?*)  +  MSE 


which  can  be  expressed,  using  (3.30),  as  follows: 


(3.37a) 


s2(n(„™))  =  MSE 


1  (Xh  -  Xf 

1  a - 1-  -X-H - JL — 

n  S(Xf-  -  X)2 


Example.  Suppose  that  the  Westwood  Company  wishes  to  predict  the  num¬ 
ber  of  man-hours  required  in  the  forthcoming  production  run  of  size  55  with  a  90 
percent  prediction  interval,  and  that  the  parameter  values  are  unknown.  We  re¬ 
quire  t(. 95;  8)  —  1.860.  From  earlier  work,  we  have: 

?55  =  120  =■  .80515 

MSE  =  7.5 

Using  (3.37),  we  obtain: 

s2(Y55{n&  w))  =  .80515  +  7.5  =  8.30515 

so  that: 


s(T55(new))  =  2.88187 

Hence,  the  90  percent  prediction  interval  for  Tss^ew)  is  by  (3.35): 

120  -  1.860(2.88187)  <  T55(new)  ^  120  +  1.860(2.88187) 

114.6  <T55(new)<  125.4 

With  confidence  coefficient  .90,  we  predict  that  the  number  of  man-hours  for  the 
next  production  run  of  55  parts  will  be  somewhere  between  114.6  and  125.4. 

Comments 

1 .  The  90  percent  prediction  interval  for  Y55(new)  just  obtained  is  wider  than  the  90 
percent  confidence  interval  for  E(Y55 )  obtained  on  page  75.  The  reason  is  that  when 

A 

predicting  a  new  observation,  we  encounter  both  the  variability  in  Yh  from  sample  to 
sample  as  well  as  the  variation  within  the  probability  distribution  of  Y. 

2.  Formula  (3.37a)  indicates  that  the  prediction  interval  is  wider  the  further  Xh  is  from 
X.  The  reason  for  this  is  that  the  estimate  of  the  mean  Yh,  as  noted  earlier,  is  less  precise  as 
Xh  is  located  further  away  from  X. 


3.5  Prediction  of  new  observation  /  81 


3.  The  confidence  coefficient  for  the  prediction  limits  (3.35)  refers  to  the  taking  of 
repeated  samples  based  on  the  same  set  of  X  values,  and  calculating  prediction  limits  for 
hi(new)  for  each  sample. 

4.  Prediction  limits  lend  themselves  to  statistical  control  uses.  In  our  example,  sup¬ 
pose  that  the  new  production  run  of  55  parts,  for  which  the  prediction  limits  were  114.6 
and  125.4  hours,  actually  required  135  hours.  Management  here  would  have  an  indication 
that  a  change  in  the  production  process  may  have  occurred,  and  may  wish  to  initiate  a 
search  for  the  assignable  cause. 

5.  When  the  sample  size  is  large,  the  last  two  terms  inside  the  brackets  in  (3.37a)  are 
small  compared  to  l,  the  first  term  in  the  brackets.  Also,  of  course,  the  t  distribution  is 
then  approximately  normal.  Hence,  approximate  1  —  a  prediction  limits  for  F/,(new)  when 
n  is  large  are: 

(3.38)  Yh  ±  z(  1  -  a/2 )VMSE 

6.  Prediction  limits  (3.35)  apply  for  a  single  prediction  based  on  the  sample  data. 
Next,  we  discuss  how  to  predict  the  mean  of  several  new  observations  at  a  given  Xh\  and 
in  Chapter  5,  we  take  up  how  to  make  several  predictions  at  different  Xh  values. 

7.  Prediction  intervals  resemble  confidence  intervals.  However,  they  differ  conceptu¬ 
ally.  A  confidence  interval  represents  an  inference  on  a  parameter,  and  is  an  interval 
which  is  intended  to  cover  the  value  of  the  parameter.  A  prediction  interval,  on  the  other 
hand,  is  a  statement  about  the  value  to  be  taken  by  a  random  variable. 


Prediction  of  mean  of  m  new  observations  for  given  Xh 

Occasionally,  one  would  like  to  predict  the  mean  of  m  new  observations  on  Y 
for  a  given  level  of  the  independent  variable.  Suppose  the  Westwood  Company 
has  been  asked  to  bid  on  a  contract  that  calls  for  m  =  3  independent  production 
runs  of  Xh  =  55  parts  during  the  next  few  months.  Management  would  like  to 
predict  the  mean  man-hours  per  run  for  these  three  runs,  and  then  convert  this 
into  a  prediction  of  the  total  man-hours  required  to  fill  the  contract. 

We  shall  denote  the  mean  value  of  Y  to  be  predicted  as  Yh(new).  It  can  be  shown 
that  the  appropriate  1  —  a  prediction  limits  are: 

(3.39)  Yh  ±  t(  1  -  a/2;  n  -  2)s(Yh(new) ) 

where: 


(3.39a) 


(f/jfnew)) 


s\Yh)  + 


MSE 

m 


or  equivalently: 

1  1  (Xh-  X  f  ~ 

- 1 - 1-  - J. — 

m  n  S(Xi  ~  Xf 
Note  from  (3.39a)  that  the  variance  s2(T/,(new))  has  two  components: 

1.  The  variance  of  the  sampling  distribution  of  Yh. 

2.  The  variance  of  the  mean  of  m  observations  from  the  probability  distribution 
of  Y. 


(3.39b)  s\Yk^)  =  MSE 


82  /  Inferences  in  regression  analysis 


Example.  In  the  Westwood  Company  example,  let  us  find  the  90  percent 
prediction  interval  for  the  mean  number  of  man-hours  7Wnew)  in  three  new  pro¬ 
duction  runs,  each  fovXh  =  55  parts.  From  previous  work,  we  have: 

Y55  =  120  s2(Y55)  =  .80515 

MSE  =  1.5  f(.95;  8)  =  1.860 


Hence,  we  obtain: 


s2(J55<„™))  =  -80515  +  ~  =  3.30515 


or: 


*^(f55(new))  ~  1-81801 

The  prediction  interval  for  the  mean  man-hours  per  run  then  is: 

120  -  1.860(1.81801)  <  F55(new)  <  120  +  1.860(1.81801) 

116.6  <F55(new)<  123.4 

Note  that  these  prediction  limits  are  somewhat  narrower  than  those  for  pre¬ 
dicting  the  man-hours  for  a  single  lot  of  55  parts  because  they  involve  a  predic¬ 
tion  of  the  mean  man-hours  for  three  lots. 

We  obtain  the  prediction  interval  for  the  total  number  of  man-hours  in  the 
three  production  runs  by  multiplying  the  prediction  limits  for  T55(new)  by  three: 

349.8  =  3(116.6)  <  Total  man-hours  <  3(123.4)  =  370.2 

Thus,  it  can  be  predicted  with  90  percent  confidence  that  between  350  and  370 
man-hours  will  be  needed  to  fill  the  contract  for  three  lots  of  55  parts  each. 

3.6  CONSIDERATIONS  IN  APPLYING  REGRESSION  ANALYSIS 

We  have  now  discussed  the  major  uses  of  regression  analysis — to  make  infer¬ 
ences  about  the  regression  parameters,  to  estimate  the  mean  response  for  a  given 
X,  and  to  predict  a  new  observation  Y  for  a  given  X.  It  remains  to  make  a  few 
cautionary  remarks  about  implementing  applications  of  regression  analysis. 

1.  Frequently,  regression  analysis  is  used  to  make  inferences  for  the  future. 
For  instance,  the  Westwood  Company  may  wish  to  estimate  expected  man-hours 
for  given  lot  sizes  for  purposes  of  planning  future  production.  In  applications  of 
this  type,  it  is  important  to  remember  that  the  validity  of  the  regression  applica¬ 
tion  depends  upon  whether  basic  causal  conditions  in  the  period  ahead  will  be 
similar  to  those  in  existence  during  the  period  upon  which  the  regression  analysis 
is  based.  This  caution  applies  whether  mean  responses  are  to  be  estimated,  new 
observations  predicted,  or  regression  parameters  estimated. 

2.  In  predicting  new  observations  on  Y,  the  independent  variable  X  itself 
often  has  to  be  predicted.  For  instance,  we  mentioned  earlier  the  prediction  of 
company  sales  for  next  year  from  a  demographic  projection  of  the  number  of 


3.7  Case  when  X  is  random  /  83 


persons  16  years  of  age  or  older  next  year.  A  prediction  of  company  sales  under 
these  circumstances  is  a  conditional  prediction,  dependent  upon  the  correctness 
of  the  population  projection.  It  is  easy  to  forget  the  conditional  nature  of  this  type 
of  prediction. 

3.  Another  caution  deals  with  inferences  pertaining  to  levels  of  the  independ¬ 
ent  variable  which  fall  outside  the  range  of  observations.  Unfortunately,  this 
situation  frequently  occurs  in  practice.  A  company  which  predicts  its  sales  from  a 
regression  relation  of  company  sales  to  disposable  personal  income  will  often 
find  the  level  of  disposable  personal  income  of  interest  (e.g. ,  for  the  year  ahead) 
to  fall  beyond  the  range  of  past  data.  If  the  X  level  does  not  fall  far  beyond  this 
range,  one  may  have  reasonable  confidence  in  the  application  of  the  regression 
analysis.  On  the  other  hand,  if  the  X  level  falls  far  beyond  the  range  of  past  data, 
extreme  caution  should  be  exercised  since  one  cannot  be  sure  that  the  regression 
function  which  fits  the  past  data  is  appropriate  over  the  wider  range  of  the 
independent  variable. 

4.  A  statistical  test  which  leads  to  the  conclusion  that  7^  0  does  not  estab¬ 
lish  a  cause-and-effect  relation  between  the  independent  and  dependent  varia¬ 
bles.  For  instance,  with  nonexperimental  data,  both  the  X  and  Y  variables  may  be 
simultaneously  influenced  by  other  variables  not  included  in  the  regression 
model.  Thus,  data  on  grade  school  children’s  vocabulary  (A)  and  writing  speed 
(7)  may  show  a  clear  linear  association,  but  this  could  be  largely  the  result  of  a 
child’s  age,  amount  of  education,  and  similar  factors  that  affect  both  A  and  Y.  On 
the  other  hand,  the  existence  of  a  regression  relation  in  controlled  experiments  is 
often  good  evidence  of  a  cause-and-effect  relation. 

5.  Finally,  we  should  note  again  that  special  problems  arise  when  one  wishes 
to  estimate  the  mean  response  or  predict  a  new  observation  for  a  number  of 
different  levels  of  the  independent  variable,  as  is  frequently  the  case.  The  confi¬ 
dence  coefficients  for  the  limits  (3.32)  for  estimating  E(Y)  and  for  the  prediction 
limits  (3.35)  for  a  new  observation  apply  for  a  single  level  of  A  for  a  given 
sample.  In  Chapter  5,  we  discuss  how  to  make  multiple  inferences  from  a  given 
sample. 

3.7  CASE  WHEN  A  IS  RANDOM 

The  normal  error  model  (3.1),  which  has  been  used  throughout  this  chapter 
and  will  continue  to  be  used,  assumes  that  the  A  values  are  known  constants.  As 
a  consequence  of  this,  the  confidence  coefficients  and  risks  of  errors  refer  to 
repeated  sampling  when  the  A  values  are  kept  the  same  from  sample  to  sample. 

Frequently,  it  may  not  be  appropriate  to  consider  the  A  values  as  known 
constants.  For  instance,  consider  regressing  daily  bathing  suit  sales  by  a  depart¬ 
ment  store  on  mean  daily  temperature.  Surely,  the  department  store  cannot  con¬ 
trol  daily  temperatures,  so  it  would  not  be  meaningful  to  think  of  repeated  sam¬ 
pling  when  the  temperature  levels  are  the  same  from  sample  to  sample. 

In  this  type  of  situation,  it  may  be  preferable  to  consider  both  Y  and  A  as 
random  variables.  Does  this  mean  then  that  all  of  our  earlier  results  are  not 


84  /  Inferences  in  regression  analysis 


applicable  here?  Not  at  all.  It  can  be  shown  that  all  results  on  estimation,  testing, 
and  prediction  obtained  for  model  (3.1)  still  apply  if  the  following  conditions 
hold: 

(3.40)  1 .  The  conditional  distributions  of  the  Yu  given  Xh  are  normal  and 

independent,  with  conditional  means  /30  +  and  conditional 
variance  a2. 

2.  The  Xt  are  independent  random  variables,  whose  probability 
distribution  g(Xt)  does  not  involve  the  parameters  /30,  /3, ,  cr2. 

These  conditions  require  only  that  model  (3.1)  is  appropriate  for  each  condi¬ 
tional  distribution  of  Yh  and  that  the  probability  distribution  of  Xt  does  not  in¬ 
volve  the  regression  parameters.  If  these  conditions  are  met,  all  earlier  results  on 
estimation,  testing,  and  prediction  still  hold  even  though  the  Xt  are  now  random 
variables.  The  major  modification  occurs  in  the  interpretation  of  confidence  co¬ 
efficients  and  specified  risks  of  error.  When  X  is  random,  these  refer  to  repeated 
sampling  of  pairs  of  (Xh  Yt )  values  where  the  Xt  values  as  well  as  the  Y{  values 
change  from  sample  to  sample.  Thus,  in  our  bathing  suit  sales  illustration,  a 
confidence  coefficient  would  refer  to  the  proportion  of  correct  interval  estimates 
if  repeated  samples  of  n  days’  sales  and  temperatures  were  obtained  and  the 
confidence  interval  calculated  for  each  sample.  Another  modification  occurs  in 
the  power  of  tests  which  is  different  when  X  is  a  random  variable. 


3.8  ANALYSIS  OF  VARIANCE  APPROACH  TO 
REGRESSION  ANALYSIS 

We  now  have  developed  the  basic  regression  model  and  demonstrated  its 
major  uses.  At  this  point,  we  will  view  the  relationships  in  a  regression  model  in 
a  somewhat  different  way  than  before.  This  new  perspective  will  not  enable  us  to 
do  anything  new  with  the  basic  regression  model,  but  it  will  come  into  its  own 
when  we  take  up  more  complex  regression  models  and  additional  types  of  linear 
statistical  models. 


Partitioning  of  total  sum  of  squares 

Basic  notions.  The  analysis  of  variance  approach  is  based  on  the  partitioning 
of  sums  of  squares  and  degrees  of  freedom  associated  with  the  response  variable 
Y.  To  explain  the  motivation  of  this  approach,  consider  again  the  Westwood 
Company  lot  size  example.  Figure  3.5a  shows  the  man-hours  required  for  the  10 
production  runs  presented  earlier  in  Table  2.1.  There  is  variation  in  the  number  of 
man-hours,  as  in  all  statistical  data.  Indeed,  if  all  observations  Yt  were  identically 
equal,  in  which  case  Yt  =  Y,  there  would  be  no  statistical  problems.  The  variation 
of  the  Yt  is  conventionally  measured  in  terms  of  the  deviations: 

(3.41)  Yt-Y 


3.8  Analysis  of  variance  approach  to  regression  analysis  /  85 


FIGURE  3.5  Partitioning  of  deviations  Y,  -  Y (Y values  not  plotted  to  scale) 

(a)  (b) 


These  deviations  are  shown  in  Figure  3.5a,  and  one  is  labeled  explicitly.  The 
measure  of  total  variation,  denoted  by  SSTO,  is  the  sum  of  the  squared  deviations 

(3.41) : 

(3.42)  SSTO  =  2(7,-  -  Y)2 

Here  SSTO  stands  for  total  sum  of  squares.  If  SSTO  =  0,  all  observations  are  the 
same.  The  greater  is  SSTO,  the  greater  is  the  variation  among  the  Y observations. 


86  /  Inferences  in  regression  analysis 


When  we  utilize  the  regression  approach,  the  variation  reflecting  the  uncer¬ 
tainty  in  the  data  is  that  of  the  F  observations  around  the  regression  line: 

(3.43)  Yt  -  Yt 

These  deviations  are  shown  in  Figure  3.5b.  The  measure  of  variation  in  the  data 
with  the  regression  model  is  the  sum  of  the  squared  deviations  (3.43),  which  is 
the  familiar  SSE  of  (2.21): 

(3.44)  SSE  =  2(Ff  -  Yd2 

Again,  SSE  denotes  error  sum  of  squares.  If  SSE  =  0,  all  observations  fall  on  the 
fitted  regression  line.  The  larger  is  SSE,  the  greater  is  the  variation  of  the  Y 
observations  around  the  regression  line. 

For  the  Westwood  Company  example,  we  know  from  earlier  work  (Table  3.1) 
that: 


SSTO  =13,660 
SSE  =  60 

What  accounts  for  the  substantial  difference  between  these  two  sums  of  squares? 
The  difference,  as  we  shall  show  shortly,  is  another  sum  of  squares: 

(3.45)  SSR  =  2$  -  F)2 

where  SSR  stands  for  regression  sum  of  squares.  Note  that  SSR  is  a  sum  of 
squared  deviations,  the  deviations  being: 

(3.46)  Yi  -  Y 

These  deviations  are  shown  in  Figure  3.5c.  Each  deviation  is  simply  the  differ¬ 
ence  between  the  fitted  value  on  the  regression  line  and  the  mean  of  the  fitted 
values  F.  [Recall  from  (2.18)  that  the  mean  of  the  fitted  values  F(-  is  F]  If  the 
regression  line  is  horizontal  so  that  F,  —  F  =  0,  SSR  =  0.  Otherwise,  SSR  is 
positive. 

SSR  may  be  considered  a  measure  of  the  variability  of  the  F’s  associated  with 
the  regression  line.  The  larger  is  SSR  in  relation  to  SSTO,  the  greater  is  the  effect 
of  the  regression  relation  in  accounting  for  the  total  variation  in  the  F  observa¬ 
tions. 

For  our  lot  size  example,  we  have: 

SSR  =  SSTO  -  SSE  =  13,660  -  60  =  13,600 

which  indicates  that  most  of  the  total  variability  in  man-hours  is  accounted  for  by 
the  relation  between  lot  size  and  man-hours. 


Formal  development  of  partitioning.  Consider  the  deviation  Y,  -  F,  the 
basic  quantity  measuring  the  variation  of  the  observations  Yh  We  can  decompose 
this  deviation  as  follows: 


3.8  Analysis  of  variance  approach  to  regression  analysis  /  87 


(3  .47)  XiZ-L  = 

Total  Deviation 

deviation  of  fitted 
regression 
value 

around  mean 

Thus,  the  total  deviation  l7,  —  Y  can  be  viewed  as  the  sum  of  two  components: 

1.  The  deviation  of  the  fitted  value  Yt  around  the  mean  Y. 

2.  The  deviation  of  Yj  around  the  regression  line. 

Figure  3.5d  shows  this  decomposition  for  one  of  the  observations. 

It  is  a  remarkable  property  that  the  sums  of  these  squared  deviations  have  the 
same  relationship: 

(3.48)  2(Yi  -  Yf  =  2(Yi  -  Yf  +  2(7,-  -  Yd2 

or,  using  the  notation  in  (3.42),  (3.44),  and  (3.45): 

(3.48a)  SSTO  =  SSR  +  SSE 

To  prove  this  basic  result  in  the  analysis  of  variance,  we  proceed  as  follows: 

2(F,  -  7)2  =  2[(£  —  7)  +  (Yt  -  m2 

=  2[(f,  -  Y)2  +  (Y,  ~  t f  +  2 (f,  -  Y)(Yt  -  t,)] 

=  Y.{f,  -  Yf  +  S(U  -  Y,f  +  22(9,  -  Y)(Y,  -  t,) 

The  last  term  on  the  right  is  zero,  as  can  be  seen  by  expanding  it  out: 

2 2(Yi  -  Y)(Yt  -  Yt)  =  22UYi  -  Yd  ~  2Y2(Yt  -  Yt) 

The  first  summation  on  the  right  is  zero  by  (2.20),  and  the  second  is  zero  by 
(2.17).  Hence,  (3.48)  follows. 


+  Yr-jd 

Deviation 

around 

regression 

line 


Computational  formulas.  The  definitional  formulas  for  SSTO,  SSR,  and 
SSE  presented  above  are  often  not  convenient  for  hand  computation.  Useful 
computational  formulas  for  SSTO  and  SSR,  which  are  algebraically  equivalent  to 
the  definitional  formulas,  are: 


(3.49) 

(3.50a) 


SSTO  =  2Yf  -  - — —  =  2Yf 


nY 2 


n 


SSR  =  E 


Wi  - 


2X(2Yi 


n 


bi&iX,  -  X)(Yt  -  Y)] 


or: 

(3.50b)  SSR  =  b\2{Xi  -  X)2 


Computational  formulas  for  SSE  were  given  earlier  in  (2.24). 


88  /  Inferences  in  regression  analysis 


Using  the  results  for  the  Westwood  Company  example  summarized  in  Table 
3.1,  we  obtain  for  SSR  by  (3.50a): 

SSR  =  2.0(6,800)  =  13,600 

This,  of  course,  is  the  same  result  obtained  previously  by  taking  the  difference 
SSTO  —  SSE,  except  sometimes  for  a  slight  difference  due  to  rounding  effects. 


Breakdown  of  degrees  of  freedom 

Corresponding  to  the  partitioning  of  the  total  sum  of  squares  SSTO,  there  is  a 
partitioning  of  the  associated  degrees  of  freedom  (abbreviated  df ).  We  have 
n  —  1  degrees  of  freedom  associated  with  SSTO.  One  degree  of  freedom  is  lost 
because  the  deviations  Yt  —  Y  are  not  independent  in  that  they  must  sum  to  zero. 
Equivalently,  one  degree  of  freedom  is  lost  because  the  sample  mean  Y  is  used  to 
estimate  the  population  mean. 

SSE,  as  noted  earlier,  has  n  —  2  degrees  of  freedom  associated  with  it.  Two 
degrees  of  freedom  are  lost  because  the  two  parameters  /30  and  /3|  were  estimated 

A 

in  obtaining  the  fitted  values  Yt. 

SSR  has  one  degree  of  freedom  associated  with  it.  There  are  two  parameters  in 
the  regression  equation,  but  the  deviations  Yt  —  Y  are  not  independent  because 
they  must  sum  to  zero;  hence,  one  degree  of  freedom  is  lost  from  the  possible 
degrees  of  freedom. 

Note  that  the  degrees  of  freedom  are  additive: 

n  —  1  =  1  + (ft  —  2) 

For  our  Westwood  Company  example,  these  degrees  of  freedom  are: 

9=1  +  8 


Mean  squares 

A  sum  of  squares  divided  by  its  associated  degrees  of  freedom  is  called  a 
mean  square  (abbreviated  MS).  For  instance,  an  ordinary  sample  variance  is  a 
mean  square  since  a  sum  of  squares,  2(7*  —  l7)2,  is  divided  by  its  associated 
degrees  of  freedom,  n  —  1.  We  are  interested  here  in  the  regression  mean 
square,  denoted  by  MSR: 

SSR 

(3.51)  MSR  =  -y-  =  SSR 

and  in  the  error  mean  square,  MSE,  defined  earlier  in  (2.22): 

SSE 

(3.52)  MSE  = - 

n  —  2 

For  our  Westwood  Company  example,  we  have  SSR  =  13,600  and  SSE  =  60. 


3.8  Analysis  of  variance  approach  to  regression  analysis  /  89 


Hence: 


MSR 


13,600 

1 


13,600 


Also,  we  obtained  earlier: 


MSE 


60 

8 


=  7.5 


Note 

The  two  mean  squares  MSR  and  MSE  do  not  add  to  SSTO  (n  —  1)  =  13,660 
9  =  1,518.  Thus,  mean  squares  are  not  additive. 


Analysis  of  variance  table 

Basic  table.  The  breakdowns  of  the  total  sum  of  squares  and  associated 
degrees  of  freedom  are  displayed  in  the  form  of  an  analysis  of  variance  table 
(ANOVA  table)  in  Table  3.2.  Mean  squares  of  interest  also  are  shown.  In  addi¬ 
tion,  there  is  a  column  of  expected  mean  squares  which  will  be  utilized  below. 
The  ANOVA  table  for  our  Westwood  Company  example  is  shown  in  Table  3.3. 


TABLE  3.2 

ANOVA  table  for  simple  regression 

Source  of 
Variation 

SS  df 

MS 

E(MS) 

Regression 

SSR  =  ^(fi- Y)2  1 

MSR  Sf 

o2  +  ^X(^i-JC)2 

Error 

SSE^Z(Yi-f,)2  n  —  2 

MSE=  SSE ; 
n  —  2 

o2 

Total 

SSTO  =  Z(Yt- Y)2  n-  1 

TABLE  3.3 

ANOVA  table  for  Westwood  Company  example 

Source  of 

Variation  SS  df  MS 


Regression  13,600  1  13,600 

Error  60  8  7.5 


Total  13,660  9 


Modified  table.  Sometimes,  an  ANOVA  table  showing  one  additional  ele¬ 
ment  of  decomposition  is  utilized.  Recall  that  by  (3.49): 


90  /  Inferences  in  regression  analysis 


SSTO  =  2(F  -  Y)2  =  2Yf  -  nY2 

In  the  modified  ANOVA  table,  the  total  uncorrected  sum  of  squares,  denoted  by 
SSTOU,  is  defined  as: 

(3.53)  SSTOU  =  2Yf 

and  the  correction  for  the  mean  sum  of  squares,  denoted  by  ^(correction  for 
mean),  is  defined  as: 

(3.54)  ^(correction  for  mean)  =  nY2 

Table  3.4  shows  this  modified  ANOYA  table.  The  general  format  is  presented 
in  part  (a)  and  the  Westwood  Company  results  in  part  (b).  Both  types  of  ANOVA 
tables  are  widely  used.  Ordinarily,  we  shall  utilize  the  basic  type  of  table. 


TABLE  3.4  Modified  ANOVA  table  for  simple  regression  and  results  for  Westwood 
Company  example 


Source  of 
Variation 

(a)  General 

SS 

df 

MS 

Regression 

Error 

SSR  =  2(?i~  Y)2 

SSE  =  1(7*-  Yd2 

1 

n  —  2 

SSR 
MSR  =  — 

SSE 

MSE  = - t 

n  —  2 

Total 

SSTO  =  E  (Ti  F)2 

n  —  1 

Correction  for  mean 

^(correction  for 
mean)  =  nY2 

1 

Total,  uncorrected 

SSTOU  =2  Yt2 

n 

(b)  Westwood  Company  Example 

Source  of 

Variation 

SS 

df 

MS 

Regression 

13,600 

1 

13,600 

Error 

60 

8 

7.5 

Total 

13,660 

9 

Correction  for  mean 

121,000 

1 

Total,  uncorrected 

134,660 

10 

Expected  mean  squares 

We  now  find  the  expected  value  of  each  of  the  mean  squares  in  the  ANOVA 
table  so  that  we  can  know  what  quantity  each  mean  square  estimates. 


3.8  Analysis  of  variance  approach  to  regression  analysis  /  91 


We  stated  earlier  that  MSE  is  an  unbiased  estimator  of  the  error  variance  a2: 


(3.55) 


E(MSE)  =  a2 


This  follows  from  theorem  (3.11),  which  states  that  SSE/cr 2  ~  x\n  ~  2)  for 
model  (3.1).  Hence,  it  follows  from  property  (1.37)  of  the  chi-square  distribution 
that: 


E 


SSE 


=  n  —  2 


or  that: 


E(MSE)  =  a2 


To  find  the  expected  value  of  MSR,  we  begin  with  (3.50b): 

SSR  =  bU(Xt  -  X  f 


Now  by  (1.14a),  we  have: 

(3.56)  o-2(i>,)  =  E(bf)  -  [ E(b ,)]2 


We  know  from  (3.3a)  that  E(b i)  =  pi,  and  from  (3.3b)  that: 


cr2(h,) 


S(Z,-  -  Xf 


Hence,  substituting  into  (3.56),  we  obtain: 


E(b\) 


S(X,  -  Xf 


+  0 


2 

1 


It  now  follows  that: 

E(SSR)  =  E(bj)^(Xi  -  X)2  =  a2  +  /3?E(^  -  X)2 
Finally,  E(MSR )  is: 

l  SSR  \  -  -  - 

(3.57)  E(MSR)  =  ^ ~y~  I  =  a2  +  j8 ?2(Xf  -  X)2 

Table  3.2  contains  the  expected  mean  squares  which  we  have  just  derived. 

Comments 

1.  The  expectation  of  MSE  is  <x2,  whether  or  not  X  and  Y  are  linearly  related,  i.e., 
whether  or  not  j6i  =  0. 

2.  The  expectation  of  MSR  is  also  a2  when  /3j  =  0.  On  the  other  hand,  when  j6j  ¥=  0, 
E(MSR)  is  greater  than  cr2  since  the  term  /32E(X,  —  X)2  in  (3.57)  then  must  be  positive. 
Thus,  for  testing  whether  or  not  fix  —  0,  a  comparison  of  MSR  and  MSE  suggests  itself.  If 
MSR  and  MSE  are  of  the  same  order  of  magnitude,  this  would  suggest  that  j6i  =  0.  On  the 
other  hand,  if  MSR  is  substantially  greater  than  MSE,  this  would  suggest  that  /3 1  7^  0.  This 
indeed  is  the  basic  idea  underlying  the  analysis  of  variance  test  to  be  discussed  next. 


92  /  Inferences  in  regression  analysis 


F  test  of  0X  =  0  versus  0,  F  0 

The  general  analysis  of  variance  approach  provides  us  with  a  battery  of  highly 
useful  tests  for  regression  models  (and  other  linear  statistical  models).  For  the 
simple  regression  case  considered  here,  the  analysis  of  variance  provides  us  with 
a  test  for: 


(3.58) 


H0:  j8i  =  0 

Ha:  0!  5*  0 


Test  statistic.  The  test  statistic  for  the  analysis  of  variance  approach  is  de¬ 
noted  by  F*.  As  just  mentioned,  it  compares  MSR  and  MSE  in  the  following 
fashion: 


(3.59) 


F* 


MSR 

MSE 


The  earlier  motivation,  based  on  the  expected  mean  squares  in  Table  3.2,  sug¬ 
gests  that  large  values  of  F*  support  Ha  and  values  of  F*  near  1  support  H0.  In 
other  words,  the  appropriate  test  is  an  upper- tail  one. 


Distribution  of  F*.  In  order  to  be  able  to  construct  a  statistical  decision  rule 
and  examine  its  properties,  we  need  to  know  the  sampling  distribution  of  F*.  We 
begin  by  considering  the  sampling  distribution  of  F*  when  H0  (/3|  =  0)  holds. 
The  famous  Cochran’s  theorem  will  be  most  helpful  in  this  connection.  For  our 
purposes,  this  theorem  can  be  put  as  follows: 

(3.60)  If  all  n  observations  Y,  come  from  the  same  normal  distribution 
with  mean  /r  and  variance  a2,  and  SSTO  is  decomposed  into  k 
sums  of  squares  SSr,  each  with  degrees  of  freedom  dfr ,  then  the 
SSjJcr2  terms  are  independent  x~  variables  with  dfr  degrees  of  free¬ 
dom  if: 


k 

2  dfr  =  n  -  1 

r=  1 

Note  from  Table  3.2  that  we  have  decomposed  SSTO  into  the  two  sums  of 
squares  SSR  and  SSE,  and  that  their  degrees  of  freedom  are  additive.  Hence: 

If  0i  =  0  so  that  all  Yt  have  the  same  mean  /x  —  /30  and  the  same  variance 

,  SSE  SSR  , 

a  ,  — —  and  — y-  are  independent  x  variables. 
a  a 

Now  consider  the  test  statistic  F*,  which  we  can  write  as  follows: 

SSR  SSE 

a2  a2  MSR 

j- 1  jjj  .  _ 

1  n  -  2  MSE 


3.8  Analysis  of  variance  approach  to  regression  analysis  /  93 


But  by  Cochran’s  theorem,  we  have  when  H0  holds: 

x2(l)  .  X2(n  —  2) 


where  the  x2  variables  are  independent.  Thus,  when  H0  holds,  F*  is  the  ratio  of 
two  independent  x2  variables,  each  divided  by  its  degrees  of  freedom.  But  this  is 
the  definition  of  an  F  random  variable  in  (1.42). 

We  have  thus  established  that  if  H0  holds,  F*  follows  the  F  distribution, 
specifically  the  F(l,  n  —  2)  distribution. 

When  Ha  holds,  it  can  be  shown  that  F*  follows  the  noncentral  F  distribution, 
a  complex  distribution  that  we  need  not  consider  further  at  this  time. 

Note 

SSR  and  SSE  are  independent  and  SSE/a2  ~  y2  even  if  fa  ^  0.  But  that  both  SSR/a2 
and  SSE/ a2  are  x2  random  variables  requires  that  jSj  =  0. 


Construction  of  decision  rule.  Since  the  test  is  upper-tailed  and  F*  is  dis¬ 
tributed  as  F(l,  n  —  2)  when  H0  holds,  the  decision  rule  is  as  follows  when  the 
risk  of  a  Type  I  error  is  to  be  controlled  at  a\ 


(3.61) 


If  F*  <  F(1  —  a;  1,  n  —  2),  conclude  H0 
If  F*  >  F(1  —  a;  1,  n  —  2),  conclude  Ha 


where  F(1  —  a\  1,  n  —  2)  is  the  (1  —  ct)  100  percentile  of  the  appropriate  F 
distribution. 


Example.  Using  our  Westwood  Company  lot  size  example  again,  let  us 
repeat  the  earlier  test  on  /3i .  This  time  we  will  use  the  F  test.  The  alternative 
conclusions  are: 


Ho-  ft  =  0 
Ha  :  fa  *  0 

As  before,  let  a  =  .05.  Since  n  =  10,  we  require  F(.95;  1,  8).  We  find  from 
Table  A-4  in  the  Appendix  that  F(. 95;  1,8)  =  5.32.  The  decision  rule  is: 

If  F*  <  5.32,  conclude  H0 
If  F*  >  5.32,  conclude  Ha 

We  have  from  Table  3.3  that  MSR  =  13,600  and  MSE  =  7.5.  Hence,  F*  is: 


p* 


13,600 

7.5 


1,813 


Since  F*  =  1,813  >  5.32,  we  conclude  Ha,  that  fa  0,  or  that  there  is  a 
linear  association  between  man-hours  and  lot  size.  This  is  the  same  result  as 
when  the  t  test  was  employed,  as  it  must  be  according  to  our  discussion  below. 

The  P-value  for  the  test  statistic  is  the  probability  P[F(1,  8)  >  F*  =  1,813]. 
From  Table  A-4  we  see  that  this  P- value  is  less  than  .001  since  F(.999;  1,8)  = 
25.4. 


94  /  Inferences  in  regression  analysis 


Equivalence  of  F  test  and  t  test.  For  a  given  a  level,  the  F  test  of  Pi  =  0 
versus  Pi  ¥=  0  is  equivalent  algebraically  to  the  two-tailed  t  test.  To  see  this, 
recall  from  (3.50b)  that: 

SSR  =  ti&QCi  -  X  f 


Thus,  we  can  write: 

p*  = 


SSR  +  1 


b\ 2(X,  -  X)2 


SSE  -r  (n  -  2)  MSE 

But  since  s2(b{)  =  MSE/'ZiXi  —  X )2,  we  obtain: 


(3.62) 


p*  = 


b\ 


s  (jbi) 


s(b  i) 


Now,  we  know  from  earlier  discussion  that  the  t*  statistic  for  testing  whether 
or  not  Pi  —  0  is  by  (3.17): 


t*  = 


b\ 


s(bi) 


In  squaring,  we  obtain  the  expression  for  F*  in  (3.62).  Thus: 

.  h.  12 

(r*)2  = 


s{bi) 

In  our  illustrative  problem,  we  just  calculated  that  F*  =  1,813.  From  earlier 
work,  we  have:  b\  =  2.0,  s{b{)  =  .04697.  Thus: 

2  "  2-°  r 

a*)2=  =  1,813 


.04697 


Corresponding  to  the  relation  between  t*  and  F*,  we  have  the  following 
relation  between  the  required  percentiles  of  the  t  and  F  distributions  in  the  tests: 
[7(1  —  a/2;  n  —  2)]2  =  F(1  —  a;  1,  n  —  2).  In  our  tests  on  Pi,  these  percentiles 
were:  [r(.975;  8)]2  =  (2.306)2  =  5.32  =  F(.95;  1,  8).  Remember  that  the  t  test 
is  two-tailed  while  the  F  test  is  one-tailed. 

Thus,  at  a  given  a  level,  we  can  use  either  the  t  test  or  the  F  test  for  testing 
Pi  =  0  versus  Pi  0.  Whenever  one  test  leads  to  H0,  so  will  the  other,  and 
correspondingly  for  Ha.  The  t  test,  however,  is  more  flexible  since  it  can  be  used 
for  one-sided  alternatives  involving  pi(^  >)0  versus  P\(  >  <  )0,  while  the  F 
test  cannot. 


General  linear  test.  The  analysis  of  variance  test  of  Pi  =  0  versus  px  ^  0  is 
an  example  of  a  general  test  of  a  linear  statistical  model.  We  shall  briefly  explain 
this  general  test  approach  in  terms  of  our  simple  regression  model.  We  do  so  at 
this  time  because  of  the  generality  of  the  approach  and  the  wide  use  we  shall 
make  of  it,  and  because  of  the  simplicity  of  understanding  the  approach  in  terms 
of  our  present  problem. 


3.8  Analysis  of  variance  approach  to  regression  analysis  /  95 


We  begin  with  the  model,  which  in  this  context  is  called  the  full  or  unre¬ 
stricted  model.  For  our  simple  regression  case,  the  full  model  is: 

(3.63)  Yj  =  (30  +  /3,X,  +  Bi  Full  model 

We  fit  this  full  model  by  the  method  of  least  squares  and  obtain  the  error  sum 
of  squares  SSE.  In  this  context,  we  shall  call  this  sum  of  squares  SSE(F )  to 
indicate  that  it  measures  the  variation  of  the  Yt  around  the  regression  line  for  the 
full  model. 

Next,  we  consider  H0.  In  this  instance,  we  have: 

(3.64)  H0:  jSi  =  0 

The  model  when  H0  holds  is  called  the  reduced  or  restricted  model.  Here,  it  is: 

(3.65)  Yj  =  (3q  +  £,•  Reduced  model 

We  fit  this  reduced  model  by  the  method  of  least  squares  and  obtain  the  error 
sum  of  squares  for  this  reduced  model,  denoted  by  SSE(R).  When  we  fit  the 
particular  reduced  model  (3.65),  it  can  be  shown  that  the  least  squares  estimator 
of  (30  is  Y.  Hence,  Yt  =  Y  and  the  error  sum  of  squares  for  this  reduced  model  is: 

(3.66)  SSE(R)  =  ^(Yt  -  Y)2  =  SSTO 

The  logic  now  is  to  compare  SSE(F)  and  SSE(R).  It  can  be  shown  that  SSE(F) 
never  is  greater  than  SSE(R ): 

(3.67)  SSE(F)  <  SSE  (JR) 

The  reason  is  that  the  more  parameters  there  are  in  the  model,  the  better  one  can 
fit  the  data  and  the  smaller  are  the  deviations  around  the  fitted  regression  line.  If 
SSE(F )  is  not  much  less  than  SSE(R),  using  the  full  model  does  not  account  for 
much  more  of  the  variability  of  the  Yt  than  the  reduced  model,  in  which  case  the 
data  suggest  H0  holds.  To  put  this  another  way,  if  SSE(F)  is  close  to  SSE(R),  the 
variation  of  the  observations  around  the  regression  line  for  the  full  model  is 
almost  as  great  as  the  variation  around  the  regression  line  for  the  reduced  model, 
in  which  case  the  added  parameters  in  the  full  model  really  do  not  help  to  reduce 
the  variation  in  the  Yt.  Thus,  a  small  difference  SSE(R )  —  SSE(F)  suggests  that 
H0  holds.  On  the  other  hand,  a  large  difference  suggests  that  Ha  holds  because 
the  additional  parameters  in  the  model  do  help  to  reduce  substantially  the  varia¬ 
tion  of  the  observations  Y,  around  the  fitted  regression  line. 

The  actual  test  statistic  used  is  a  function  of  SSE(R )  -  SSE(F),  namely: 

#  =  SSE(R )  -  SSE(F)  SSE(F ) 

dfR  ~  dfF  dfF 

which  follows  the  F  distribution  if  H0  holds.  The  degrees  of  freedom  dfR  and  dfF 
are  those  associated  with  the  reduced  and  full  model  error  sums  of  squares, 
respectively.  Large  values  of  F*  lead  to  Ha. 


96  /  Inferences  in  regression  analysis 


For  our  application,  we  have: 

SSE(R)  =  SSTO  SSE(F )  =  SSE 
dfR  =  n-  1  dfF  =  n  -  2 

so  that  we  obtain  when  substituting  into  (3.68): 

SSTO  -  SSE  ^  SSE  _  SSR_  ^  SSE  _  MSR 
(n  —  1)  —  (n  —  2)  n  —  2  1  n  —  2  MSE 

which  is  our  old  test  statistic  (3.59). 

This  general  approach  can  be  used  for  highly  complex  tests  of  linear  statistical 
models,  as  well  as  for  simpler  tests.  The  basic  steps  again  are: 

1.  Fit  the  full  model  and  obtain  the  error  sum  of  squares  SSE(F). 

2.  Fit  the  reduced  model  under  H0  and  obtain  the  error  sum  of  squares  SSE(R). 

3.  Use  the  test  statistic  (3.68). 

3.9  DESCRIPTIVE  MEASURES  OF  ASSOCIATION  BETWEEN  X  AND  Y 
IN  REGRESSION  MODEL 

We  have  discussed  the  major  uses  of  regression  analysis — estimation  of  pa¬ 
rameters  and  means  and  prediction  of  new  observations — without  mentioning 
the  “degree  of  linear  association”  between  X  and  Y,  or  similar  terms.  The  reason 
is  that  the  usefulness  of  estimates  or  predictions  depends  upon  the  width  of  the 
interval  and  the  user’s  needs  for  precision,  which  vary  from  one  application  to 
another.  Hence,  no  single  descriptive  measure  of  the  “degree  of  linear  associa¬ 
tion”  can  capture  the  essential  information  as  to  whether  a  given  regression 
relation  is  useful  in  any  particular  application. 

Nevertheless,  there  are  times  when  the  degree  of  linear  association  is  of  inter¬ 
est  in  its  own  right.  We  shall  now  briefly  discuss  two  descriptive  measures  that 
are  frequently  used  in  practice  to  describe  the  degree  of  linear  association  be¬ 
tween  X  and  Y. 


Coefficient  of  determination 

We  saw  earlier  that  SSTO  measures  the  variation  in  the  observations  Yh  or  the 
uncertainty  in  predicting  Y,  when  no  account  of  the  independent  variable  X  is 
taken.  Thus,  SSTO  is  a  measure  of  the  uncertainty  in  predicting  Y  when  X  is  not 
considered.  Similarly,  SSE  measures  the  variation  in  the  Yt  when  a  regression 
model  utilizing  the  independent  variable  A  is  employed.  A  natural  measure  of  the 
effect  of  X  in  reducing  the  variation  in  Y,  i.e.,  the  uncertainty  in  predicting  Y,  is 
therefore: 


r 


2 


SSTO  -  SSE 


SSR  SSE 

- =  1  - 


(3.69) 


SSTO 


SSTO 


SSTO 


3.9  Descriptive  measures  of  association  between  X  and  Y  in  regression  model  /  97 


The  measure  r2  is  called  the  coefficient  of  determination.  Since  0  <  SSE  < 
SSTO,  it  follows  that: 

(3.70)  0  <  r2  <  1 

We  may  interpret  r2  as  the  proportionate  reduction  of  total  variation  associated 
with  the  use  of  the  independent  variable  X.  Thus,  the  larger  is  r2,  the  more  is  the 
total  variation  of  Y  reduced  by  introducing  the  independent  variable  X.  The 
limiting  values  of  r2  occur  as  follows: 

1 .  If  all  observations  fall  on  the  fitted  regression  line,  SSE  =  0  and  r2  —  1 . 
This  case  is  shown  in  Figure  3.6a.  Here,  the  independent  variable  X  ac¬ 
counts  for  all  variation  in  the  observations  Y,. 


2.  If  the  slope  of  the  fitted  regression  line  is  b\  =  0  so  that  Yt  =  Y,  SSE  =  SSTO 
and  r2  =  0.  This  case  is  shown  in  Figure  3.6b.  Here,  there  is  no  linear 
association  between  X  and  Y  in  the  sample  data,  and  the  independent  varia¬ 
ble  X  is  of  no  help  in  reducing  the  variation  in  the  observations  Yt  with  linear 
regression. 

In  practice,  r2  is  not  likely  to  be  0  or  1 ,  but  rather  somewhere  in  between  these 
limits.  The  closer  it  is  to  1,  the  greater  is  said  to  be  the  degree  of  linear  associa¬ 
tion  between  X  and  Y. 

Coefficient  of  correlation 

The  square  root  of  r2: 


(3.71) 


r  = 


Inferences  in  regression  analysis 


is  called  the  coefficient  of  correlation.  A  plus  or  minus  sign  is  attached  to  this 
measure  according  to  whether  the  slope  of  the  fitted  regression  line  is  positive  or 
negative.  Thus,  the  range  of  r  is: 

(3.72)  -l<r<l 

Whereas  r2  indicates  the  proportional  reduction  in  the  variability  of  Y  attained  by 
the  use  of  information  about  X,  the  square  root,  r,  does  not  have  such  a  clear-cut 
operational  interpretation.  Nevertheless,  there  is  a  tendency  to  use  r  instead  of  r2 
in  much  applied  work. 

It  is  worth  noting  that  since  for  any  r2  other  than  0  or  1 ,  r2  <  |  r  | ,  r  may  give 
the  impression  of  a  “closer”  relationship  between  X  and  Y  than  does  the  corre¬ 
sponding  r2.  For  instance,  r2  =  .10  indicates  that  the  total  variation  in  Y  is  re¬ 
duced  by  only  10  percent  when  X  is  introduced,  yet  \  r\  =  .32  may  give  an 
impression  of  greater  linear  association  between  X  and  Y. 


Example 

For  the  Westwood  Company  example,  we  obtained  SSTO  =  13,660  and  SSE 
=  60.  Hence: 


r 


2 


13,660  -  60 
13,660 


.996 


Thus,  the  variation  in  man-hours  is  reduced  by  99.6  percent  when  lot  size  is 
considered. 

The  correlation  coefficient  in  this  example  is: 

r  =  +V.996  =  +.998 


The  plus  sign  is  affixed  since  b\  is  positive. 


Computational  formula  for  r 


A  direct  computational  formula  for  r,  which  automatically  furnishes  the 
proper  sign,  is: 


(3.73) 


S(X,  -  X){Yt  -  Y) 
[2(X,  -  X)22(T(-  -  T)2]172 


n 


1/2 


3.10  Computer  output  /  99 


Comments 


1.  The  following  relation  between  b\  and  r  is  worth  noting: 


(3.74) 


b\  = 


-  Y)2 

2(Xf  -  X)2 


1/2 


r  — 


SY 

sx 


r 


whereby  =  [S(F,-  —  Y)2/(n  —  1  )]1/2  and  =  [2(Z,-  —  X)2/(n  —  1)] 1/2  are  the  sample  stand¬ 
ard  deviations  for  the  Y  and  X  observations,  respectively.  Note  that  by  =  0  when  r  =  0, 
and  vice  versa.  Thus,  r  —  0  implies  a  horizontal  fitted  regression  line,  and  vice  versa. 

2.  The  value  taken  by  r 2  in  a  given  sample  tends  to  be  affected  by  the  spacing  of  the 
X  observations.  This  is  implied  in  (3.69).  SSE  is  not  affected  systematically  by  the  spacing 
of  the  X’s  since  for  model  (3.1),  <j2(Y-)  —  a2  at  all  X  levels.  However,  the  wider  the 
spacing  is  of  the  X’s  in  the  sample  when  by  X  0,  the  greater  will  tend  to  be  the  spread  of 
the  observed  F’s  around  Y  and  hence  the  greater  will  be  SSTO.  Consequently,  the  wider 
the  X’s  are  spaced,  the  higher  will  tend  to  be  r2. 

3.  The  regression  sum  of  squares  SSR  is  often  called  the  “explained  variation”  in  Y. 
The  residual  sum  of  squares  SSE  is  then  called  the  “unexplained  variation,”  and  the  total 
sum  of  squares  the  “total  variation.”  The  coefficient  r2  then  is  interpreted  in  terms  of  the 
proportion  of  the  total  variation  in  Y  which  has  been  “explained”  by  X.  Unfortunately, 
this  terminology  frequently  is  taken  literally,  hence  misunderstood.  Remember  that  in  a 
regression  model  there  is  no  implication  that  Y  necessarily  depends  on  I  in  a  causal  or 
explanatory  sense. 

4.  A  value  of  r  or  r 2  relatively  close  to  1  sometimes  is  taken  as  an  indication  that 
sufficiently  precise  inferences  on  Y  can  be  made  from  knowledge  of  X.  As  mentioned 
earlier,  the  usefulness  of  the  regression  relation  depends  upon  the  width  of  the  confidence 
or  prediction  interval  and  the  particular  needs  for  precision,  which  vary  from  one  applica¬ 
tion  to  another.  Hence,  no  single  measure  is  an  adequate  indicator  of  the  usefulness  of  the 
regression  relation. 

5.  Regression  models  do  not  contain  any  parameter  to  be  estimated  by  r  or  r2.  These 
coefficients  simply  are  descriptive  measures  of  the  degree  of  linear  association  between  X 
and  Y  in  the  sample  observations  which  may,  or  may  not,  be  useful  in  any  one  instance.  In 
a  later  chapter,  we  discuss  correlation  models  which  do  contain  a  parameter  for  which  r 
is  an  estimator. 


3.10  COMPUTER  OUTPUT 

Figure  3.7  shows  again  the  computer  printout  for  the  Westwood  Company 
case  presented  in  Figure  2.12.  We  referred  to  selected  items  in  this  printout  in 
Chapter  2.  Now,  we  are  in  a  position  to  consider  the  printout  as  a  whole. 

The  10  observations  on  lot  size  and  man-hours  are  printed  at  the  top.  This 
enables  us  to  verify  that  the  observations  were  entered  into  the  computer  accu¬ 
rately.  We  have  annotated  the  output  in  Figure  3.7  in  terms  of  the  notation  used 
in  this  book.  The  computer  package  output  illustrated  in  Figure  3.7  does  not 
provide  s(b0),  the  estimated  standard  deviation  of  b0.  However,  this  estimate  can 
be  easily  calculated  from  the  data  given  in  the  computer  output.  Note  in  this 
connection  that  the  denominator  term  2(X,-  —  X )2  in  (3.21)  is  equal  to  (n  -  1  )sx, 
and  that  sx  is  given  in  the  computer  output. 


100  /  Inferences  in  regression  analysis 


FIGURE  3.7  Segment  of  computer  output  for  regression  run  on  Westwood  Company 
data  (SPSS,  Ref.  3.1) 


*/ 


1  SIZE  t 
30.0000 
20.0000 
60.0000 
80.0000 
40.0000 
50.0000 
60.0000 
30.0000 
70.0000 
60.0000 


VARIABLES 
2  HOURS 
73.0000 
50.0000 
128.0000 
170.0000 
87.0000 
108.0000 
135.0000 
69.0000 
148.0000 
132.0000 


DEPENDENT  VARIABLE. .  HOURS 


VARIABLE(S)  ENTERED  ON  STEP  NUMBER  1..  SIZE 


MULTIPLE  R  0.99780  4—  r 

R  SQUARE  0.99561  4 — 


STANDARD  ERROR 


VARIABLE 

SIZE 

(CONSTANT) 


2.000000 

10.00000 


2.73861  4 —\JMSE 

VARIABLES  IN  THE  EQUATION  . . 

B  STD  ERROR  B 


S(^) 


0.04697 


1813. 3334 


VARIABLE  _  MEAN 
SIZE  _X— ►50.0000 
HOURS  Y — ►110.0000 


STANDARD  DEV 

SX - ►  19. 4365 

Sy  —  ►  38.9587 


CASES 

10 

10 


n 


ANALYSIS  OF  VARIANCE 
REGRESSION 
residual  4 — Error 


DF  SUM  OF  SQUARES  MEAN  SQUARE 

1.  SS/?— ►13600.00000  MSR— ►13600.00000 

8.  SSE— ►60.00000  MSE — ^7. 50000 


Computer  printouts  for  regression  analysis  programs  differ  substantially  in 
format  from  one  program  to  another.  In  addition,  differences  in  the  computed 
results  may  occur  because  different  program  packages  do  not  control  roundoff 
errors  equally  well.  Before  using  a  computer  program  the  first  time,  it  is  a  good 
idea  to  check  it  on  a  set  of  data  for  which  the  exact  results  are  known. 


PROBLEMS 

3.1.  A  student,  working  on  a  summer  internship  in  the  economic  research  office  of  a 
large  corporation,  studied  the  relation  between  sales  of  a  product  (7,  in  million 
dollars)  and  population  (A,  in  million  persons)  in  the  firm’s  50  marketing  districts. 
Regression  model  (3.1)  was  employed.  The  student  first  wished  to  test  whether  or 
not  a  linear  association  between  7  and  X  existed.  Using  a  time-sharing  computer 
service  available  to  the  firm,  the  student  accessed  an  interactive  simple  linear 


Problems  /  101 


regression  program  and  obtained  the  following  information  on  the  regression  co¬ 
efficients: 


95  Percent 

Parameter  Estimated  Value  Confidence  Limits 


Intercept  7.43119  -1.18518  16.0476 

Slope  .755048  .452886  1.05721 

a.  The  student  concluded  from  these  results  that  there  is  a  linear  association 
between  Y  and  X.  Is  the  conclusion  warranted?  What  is  the  implied  level  of 
significance? 

b.  Someone  questioned  the  negative  lower  confidence  limit  for  the  intercept, 
pointing  out  that  dollar  sales  cannot  be  negative  even  if  the  population  in  a 
district  is  zero.  Discuss. 

3.2.  In  a  test  of  the  alternatives  H0:  /3i  <  0  versus  Ha:  >  0,  an  analyst  concluded 
H0.  Does  this  conclusion  imply  that  there  is  no  linear  association  between  X  and 
y?  Explain. 

3.3.  A  member  of  a  student  team  playing  an  interactive  marketing  game  received  the 
following  computer  output  when  studying  the  relation  between  advertising  ex¬ 
penditures  (X)  and  sales  (y)  for  one  of  the  team’s  products: 

Estimated  regression  equation:  Y  —  350.7  —  .18X 
Two-sided  P- value  for  estimated  slope:  .91 

The  student  stated:  “The  message  I  get  here  is  that  the  more  we  spend  on  advertis¬ 
ing  this  product,  the  fewer  units  we  sell!’’  Comment. 

3.4.  Refer  to  Grade  point  average  Problem  2.15.  Some  additional  results  are:  bQ  = 
-1.700,  s(b0)  =  .7267,  bx  =  .8399,  s{bx)  =  .144,  MSE  =  .1892. 

a.  Obtain  a  99  percent  confidence  interval  for  fi\.  Interpret  your  confidence 
interval.  Does  it  include  zero?  Why  might  the  director  of  admissions  be 
interested  in  whether  the  confidence  interval  includes  zero? 

b.  Test,  using  the  test  statistic  t*,  whether  or  not  a  linear  association  exists 
between  student’s  entrance  test  score  (X)  and  GPA  at  the  end  of  the  freshman 
year  (y).  Use  a  level  of  significance  of  .01.  State  the  alternatives,  decision 
rule,  and  conclusion. 

c.  What  is  the  P-value  of  your  test  in  part  (b)?  How  does  it  support  the  conclu¬ 
sion  reached  in  part  (b)? 

3.5.  Refer  to  Calculator  maintenance  Problem  2.16.  Some  additional  results  are: 
bQ  =  -2.3221,  s(jbo)  =  2.564,  bx  =  14.738,  $(£,)  =  .519,  MSE  =  20.086. 

a.  Estimate  the  change  in  the  mean  service  time  when  the  number  of  machines 
serviced  increases  by  one.  Use  a  90  percent  confidence  interval.  Interpret 
your  confidence  interval. 

b.  Conduct  a  t  test  to  determine  whether  or  not  there  is  a  linear  association 
between  X  and  Y  here;  control  the  a  risk  at  .10.  State  the  alternatives,  deci¬ 
sion  rale,  and  conclusion.  What  is  the  P- value  of  your  test? 

c.  Are  your  results  in  parts  (a)  and  (b)  consistent?  Explain. 

d.  The  manufacturer  has  suggested  that  the  mean  required  time  should  not  in¬ 
crease  by  more  than  14  minutes  for  each  additional  machine  that  is  serviced 
on  a  service  call.  Conduct  a  test  to  decide  whether  this  standard  is  being 


1 02  /  Inferences  in  regression  analysis 


satisfied  by  Tri-City.  Control  the  risk  of  a  Type  I  error  at  .05.  State  the 
alternatives,  decision  rule,  and  conclusion.  What  is  the  P-value  of  the  test? 
e.  Does  b0  give  any  relevant  information  here  about  the  “start-up”  time  on 
calls — i.e.,  about  the  time  required  before  service  work  is  begun  on  the 
machines  at  a  customer  location? 

3.6.  Refer  to  Airfreight  breakage  Problem  2.17. 

a.  Estimate  with  a  95  percent  confidence  interval.  Interpret  your  interval 
estimate. 

b.  Conduct  a  t  test  to  decide  whether  or  not  there  is  a  linear  association  between 
number  of  times  a  carton  is  transferred  (7)  and  number  of  broken  ampules 
(X).  Use  a  level  of  significance  of  .05.  State  the  alternatives,  decision  rule, 
and  conclusion.  What  is  the  P-value  of  the  test? 

c.  (30  represents  here  the  mean  number  of  ampules  broken  when  no  transfers  of 
the  shipment  are  made — i.e.,  when  X  —  0.  Obtain  a  95  percent  confidence 
interval  for  /30  and  interpret  it. 

d.  A  consultant  has  suggested,  based  on  previous  experience,  that  the  mean 
number  of  broken  ampules  should  not  exceed  9  when  no  transfers  are  made. 
Conduct  an  appropriate  test  using  a  =  .025.  State  the  alternatives,  decision 
rule,  and  conclusion.  What  is  the  P-value  of  the  test? 

e.  Obtain  the  power  of  your  test  in  part  (b)  if  actually  —  2.0.  Assume  cr(bi) 
=  .50.  Also  obtain  the  power  of  your  test  in  part  (d)  if  actually  fio  =  11. 
Assume  a(b0 )  =  .75. 

3.7.  Refer  to  Plastic  hardness  Problem  2. IS. 

a.  Estimate  the  change  in  the  mean  hardness  when  the  elapsed  time  increases  by 
one  hour.  Use  a  99  percent  confidence  interval.  Interpret  your  interval  esti¬ 
mate. 

b.  The  plastic  manufacturer  has  stated  that  the  mean  hardness  should  increase  by 
2  Brinell  units  per  hour.  Conduct  a  two-sided  test  to  decide  whether  this 
standard  is  being  satisfied;  use  a  =  .01.  State  the  alternatives,  decision  rule, 
and  conclusion.  What  is  the  P- value  of  the  test? 

c.  Obtain  the  power  of  your  test  in  part  (b)  if  the  standard  actually  is  being 
exceeded  by  .5  Brinell  units  per  hour.  Assume  cr{b{)  —  .16. 

3.8.  Refer  to  Figure  3.7  for  the  Westwood  Company  example.  A  consultant  has  ad¬ 
vised  that  an  increase  of  one  unit  in  lot  size  should  require  an  increase  of  1 .8  in  the 
expected  number  of  man-hours  for  the  given  production  item. 

a.  Conduct  a  test  to  decide  whether  or  not  the  increase  in  the  expected  number  of 
man-hours  in  the  Westwood  Company  equals  this  standard.  Use  a  =  .05. 
State  the  alternatives,  decision  rule,  and  conclusion. 

b .  Obtain  the  power  of  your  test  in  part  (a)  if  the  consultant’ s  standard  actually  is 
being  exceeded  by  .1  hour.  Assume  a(bi)  =  .05. 

c.  Why  is  F*  =  1813.333,  given  in  the  printout,  not  relevant  for  the  test  in  part 

(a)? 

3.9.  Refer  to  Figure  3.7.  A  student,  noting  that  j(&i)  is  furnished  in  the  printout,  asks 
why  s{Yh)  is  not  also  given.  Discuss. 

3.10.  For  each  of  the  following  questions,  explain  whether  a  confidence  interval  for  a 
mean  response  or  a  prediction  interval  for  a  new  observation  is  appropriate, 
a.  What  will  be  the  humidity  level  in  this  greenhouse  tomorrow  when  we  set  the 
temperature  level  at  31°  C? 


Problems  /  103 


b.  How  much  do  families  whose  disposable  income  is  $23,500  spend,  on  the 
average,  for  meals  away  from  home? 

c.  How  many  kilowatt-hours  of  electricity  will  be  consumed  next  month  by 
commercial  and  industrial  users  in  the  Twin  Cities  service  area,  given  that  the 
index  of  business  activity  for  the  area  remains  at  its  present  level? 

3.11.  A  person  asks  if  there  is  a  difference  between  the  “mean  response  atX  =  XY’  and 

the  “mean  of  m  new  observations  at  X  =  Xh.”  Reply. 

3.12.  Can  cr2(Yh(new))  in  (3.36)  be  brought  increasingly  close  to  0  as  n  becomes  large?  Is 

this  also  the  case  for  cr2{Yh)  in  (3.28b)?  What  is  the  implication  of  this  difference? 

3.13.  Refer  to  Grade  point  average  Problems  2.15  and  3.4. 

a.  Obtain  a  95  percent  interval  estimate  of  the  mean  freshman  GPA  for  students 
whose  entrance  test  score  is  4.7.  Interpret  your  confidence  interval. 

b.  Mary  Jones  obtained  a  score  of  4.7  on  the  entrance  test.  Predict  her  freshman 
GPA  using  a  95  percent  prediction  interval.  Interpret  your  prediction  interval. 

c.  Is  the  prediction  interval  in  part  (b)  wider  than  the  confidence  interval  in  part 
(a)?  Should  it  be? 

3.14.  Refer  to  Calculator  maintenance  Problems  2.16  and  3.5. 

a.  Obtain  a  90  percent  confidence  interval  for  the  mean  service  time  on  calls  in 
which  six  machines  are  serviced.  Interpret  your  confidence  interval. 

b.  Obtain  a  90  percent  prediction  interval  for  the  service  time  on  the  next  call  in 
which  six  machines  are  serviced.  Is  your  prediction  interval  wider  than  the 
corresponding  confidence  interval  in  part  (a)?  Should  it  be? 

c.  Suppose  that  management  wishes  to  estimate  the  expected  service  time  per 
machine  on  calls  in  which  six  machines  are  serviced.  Obtain  an  appropriate 
confidence  interval  by  converting  the  interval  obtained  in  part  (a).  Interpret 
the  converted  confidence  interval. 

3.15.  Refer  to  Airfreight  breakage  Problem  2.17. 

a.  Because  of  changes  in  airline  routes,  shipments  may  have  to  be  transferred 
more  frequently  than  in  the  past.  Estimate  the  mean  breakage  for  the  follow¬ 
ing  numbers  of  transfers:  X  =  2,  4.  Use  separate  99  percent  confidence  inter¬ 
vals.  Interpret  your  results. 

b.  The  next  shipment  will  entail  two  transfers.  Obtain  a  99  percent  prediction 
interval  for  the  number  of  broken  ampules  for  this  shipment.  Interpret  your 
prediction  interval. 

c.  In  the  next  several  days,  three  independent  shipments  will  be  made,  each 
entailing  two  transfers.  Obtain  a  99  percent  prediction  interval  for  the  mean 
number  of  ampules  broken  in  the  three  shipments.  Convert  this  interval  into  a 
99  percent  prediction  interval  for  the  total  number  of  ampules  broken  in  the 
three  shipments. 

3.16.  Refer  to  Plastic  hardness  Problem  2.18. 

a.  Obtain  a  98  percent  confidence  interval  for  the  mean  hardness  of  molded 
items  with  an  elapsed  time  of  60  hours.  Interpret  your  confidence  interval. 

b.  Obtain  a  98  percent  prediction  interval  for  the  hardness  of  a  newly  molded 
test  item  with  an  elapsed  time  of  60  hours. 

c.  Obtain  a  98  percent  prediction  interval  for  the  mean  hardness  of  10  newly 
molded  test  items,  each  with  an  elapsed  time  of  60  hours. 

d.  Is  the  prediction  interval  in  part  (c)  narrower  than  the  one  in  part  (b)?  Should 
it  be? 


104  /  inferences  in  regression  analysis 


3.17.  An  analyst  fitted  regression  model  (3.1)  and  conducted  an  F  test  of  /3j  =  0  versus 
/Si  #  0.  The  P-value  of  the  test  was  .033,  and  the  analyst  concluded  Ha\  fa  0. 
Was  the  a  level  used  by  the  analyst  greater  than  or  smaller  than  .033?  If  the  a 
level  had  been  .01,  what  would  have  been  the  appropriate  conclusion? 

3.18.  For  conducting  statistical  tests  concerning  the  parameter  /3i ,  why  is  the  t  test  more 
versatile  than  the  F  test? 

3.19.  When  testing  whether  or  not  /3i  —  0,  why  is  the  F  test  a  one-sided  test  even 
though  FI a  includes  both  /3i  <  0  and  /Si  >  0?  [Hint:  Refer  to  (3.57).] 

3.20.  A  student  asks  whether  r2  is  a  point  estimator  of  any  parameter  in  regression 
model  (3.1).  Respond. 

3.21.  A  value  of  r2  near  1  is  sometimes  interpreted  to  imply  that  the  relation  between  Y 
and  X  is  sufficiently  close  so  that  suitably  precise  predictions  of  Y  can  be  made 
from  knowledge  of  X.  Is  this  implication  a  necessary  consequence  of  the  defini¬ 
tion  of  r2? 

3.22.  Using  regression  model  (3.1)  in  an  engineering  safety  experiment,  a  researcher 
found  for  the  first  10  observations  that  r2  was  zero.  Is  it  possible  that  for  the 
complete  set  of  30  observations  r 2  will  not  be  zero?  Could  r2  not  be  zero  for  the 
first  10  observations,  yet  equal  zero  for  all  30  observations?  Explain. 

3.23.  Refer  to  Grade  point  average  Problems  2.15  and  3.4.  Some  additional  calcula- 
tional  results  are:  SSE  =  3.406,  SSR  =  6.434. 

a.  Set  up  the  ANOVA  table. 

b.  What  is  estimated  by  MSR  in  your  ANOVA  table?  By  MSE1  Under  what 
condition  do  MSR  and  MSE  estimate  the  same  quantity? 

c.  Conduct  an  F  test  of  whether  or  not  /3j  =  0.  Control  the  a  risk  at  .01.  State 
the  alternatives,  decision  rule,  and  conclusion. 

d.  What  is  the  absolute  magnitude  of  the  reduction  in  the  variation  of  Y  when  X 
is  introduced  into  the  regression  model?  What  is  the  relative  reduction?  What 
is  the  name  of  the  latter  measure? 

e.  Obtain  r  and  attach  the  appropriate  sign. 

f.  Which  measure,  r 2  or  r,  has  the  more  clear-cut  operational  interpretation? 
Explain. 

3.24.  Refer  to  Calculator  maintenance  Problems  2.16  and  3.5.  Some  additional 
calculational  results  are:  SSE  —  321.396,  SSR  =  16,182.604. 

a.  Set  up  the  basic  ANOVA  table  in  the  format  of  Table  3.2.  Which  elements  of 
your  table  are  additive?  Also  set  up  the  ANOVA  table  in  the  format  of  Table 
3.4a.  How  do  the  two  tables  differ? 

b.  Conduct  an  F  test  to  determine  whether  or  not  there  is  a  linear  association 
between  time  spent  and  number  of  machines  serviced;  use  a  =  .10.  State  the 
alternatives,  decision  rule,  and  conclusion. 

c.  By  how  much,  relatively,  is  the  total  variation  in  number  of  minutes  spent  on 
a  call  reduced  when  the  number  of  machines  serviced  is  introduced  into  the 
analysis?  Is  this  a  relatively  small  or  large  reduction?  What  is  the  name  of  this 
measure? 

d.  Calculate  r  and  attach  the  appropriate  sign. 

e.  Which  measure,  r  or  r2,  has  the  more  clear-cut  operational  interpretation? 


Problems  /  105 


3.25.  Refer  to  Airfreight  breakage  Problem  2.17. 

a.  Set  up  the  ANOVA  table.  Which  elements  are  additive? 

b.  Conduct  an  F  test  to  decide  whether  or  not  there  is  a  linear  association 
between  the  number  of  times  a  carton  is  transferred  and  the  number  of  broken 
ampules;  control  the  a  risk  at  .05.  State  the  alternatives,  decision  rule,  and 
conclusion. 

c.  Obtain  the  t*  statistic  for  the  test  in  part  (b)  and  demonstrate  its  equivalence 
to  the  F*  statistic  obtained  in  part  (b). 

d.  Calculate  r 2  and  r.  What  proportion  of  the  variation  in  Y  is  accounted  for  by 
introducing  X  into  the  regression  model? 

3.26.  Refer  to  Plastic  hardness  Problem  2.18. 

a.  Set  up  the  ANOVA  table. 

b.  Test  by  means  of  an  F  test  whether  or  not  there  is  a  linear  association  between 
the  hardness  of  the  plastic  and  the  elapsed  time.  Use  a  =  .01.  State  the 
alternatives,  decision  rule,  and  conclusion. 

c.  Plot  the  deviations  Yt  —  Yt  against  X,  on  a  graph.  Plot  the  deviations  Yt  —  Y 
against  Xt  on  another  graph.  From  your  two  graphs,  does  SSE  or  SSR  appear 
to  be  the  larger  component  of  SSTOl 

d.  Calculate  r 2  and  r. 

3.27.  Refer  to  Muscle  mass  Problem  2.23. 

a.  Conduct  a  test  to  decide  whether  or  not  there  is  a  negative  linear  association 
between  amount  of  muscle  mass  and  age.  Control  the  risk  of  Type  I  error  at 
.05.  State  the  alternatives,  decision  rule,  and  conclusion.  What  is  the  P-value 
of  the  test? 

b.  The  two-sided  P- value  for  bQ  is  0+.  Can  it  now  be  concluded  that  b0  provides 
relevant  information  on  the  amount  of  muscle  mass  at  birth  for  a  female 
child? 

c.  Estimate  with  a  95  percent  confidence  interval  the  difference  in  expected 
muscle  mass  for  women  whose  ages  differ  by  one  year.  Why  is  it  not  neces¬ 
sary  to  know  the  specific  ages  to  make  this  estimate? 

3.28.  Refer  to  Muscle  mass  Problem  2.23. 

a.  Obtain  a  95  percent  confidence  interval  for  the  mean  muscle  mass  for  women 
of  age  60.  Interpret  your  confidence  interval. 

b.  Obtain  a  95  percent  prediction  interval  for  the  muscle  mass  of  a  woman 
whose  age  is  60.  Is  the  prediction  interval  relatively  precise? 

3.29.  Refer  to  Muscle  mass  Problem  2.23. 

a.  Plot  the  deviations  Yt  -  Y  against  Xt  on  one  graph.  Plot  the  deviations  Y{  —  Y 
against  X,-  on  another  graph.  From  your  graphs,  does  SSE  or  SSR  appear  to  be 
the  larger  component  of  SSTOl 

b.  Set  up  the  ANOVA  table. 

c.  Test  whether  or  not  /Si  =  0  using  an  F  test  with  a  =  .10.  State  the  alterna¬ 
tives,  decision  rule,  and  conclusion. 

d.  What  proportion  of  the  total  variation  in  muscle  mass  remains  ‘  ‘unexplained’  ’ 
when  age  is  introduced  into  the  analysis?  Is  this  proportion  relatively  small  or 
large? 

e.  Obtain  r 2  and  r. 


106  /  Inferences  in  regression  analysis 


3.30.  Refer  to  Robbery  rate  Problem  2.24. 

a.  Test  whether  or  not  there  is  a  linear  association  between  robbery  rate  and 
population  density  using  a  t  test  with  a  =  .01 .  State  the  alternatives,  decision 
rule,  and  conclusion.  What  is  the  P-value  of  the  test? 

b.  Test  whether  or  not  (30  =  0;  control  the  risk  of  Type  I  error  at  .01.  State  the 
alternatives,  decision  rule,  and  conclusion.  Why  might  there  be  interest  in 
testing  whether  or  not  /30  =  0? 

c.  Estimate  (3i  with  a  99  percent  confidence  interval.  Interpret  your  interval 
estimate. 

3.31.  Refer  to  Robbery  rate  Problem  2.24. 

a.  Set  up  the  ANOVA  table. 

b.  Carry  out  the  test  in  Problem  3.30a  by  means  of  the  F  test.  Show  the  equiva¬ 
lence  of  the  two  test  statistics  and  decision  rules.  Is  the  P-value  for  the  F  test 
the  same  as  that  for  the  t  test? 

c.  By  how  much  is  the  total  variation  in  robbery  rate  reduced  when  population 
density  is  introduced  into  the  analysis?  Is  this  a  relatively  large  or  small 
reduction? 

d.  Obtain  r. 

3.32.  Refer  to  Robbery  rate  Problems  2.24  and  3.30.  Suppose  that  the  test  in  Problem 
3.30a  is  to  be  carried  out  by  means  of  a  general  linear  test. 

a.  State  the  full  and  reduced  models. 

b.  Obtain:  (1)  SSE(F),  (2)  SSE(R),  (3)  dfF ,  (4)  dfR ,  (5)  test  statistic  F*  for  the 
general  linear  test,  (6)  decision  rule. 

c.  Are  the  test  statistic  F*  and  the  decision  rule  for  the  general  linear  test  equiv¬ 
alent  to  those  in  Problem  3.30a? 

3.33.  In  empirically  developing  a  cost  function  from  observed  data  on  a  complex  chemi¬ 
cal  experiment,  an  analyst  employed  regression  model  (3.1).  j80  was  interpreted 
here  as  the  cost  of  setting  up  the  experiment.  The  analyst  hypothesized  that  this 
cost  should  be  $7.5  thousand  and  wished  to  test  the  hypothesis  by  means  of  a 
general  linear  test. 

a.  Indicate  the  alternative  conclusions  for  the  test. 

b.  Specify  the  full  and  reduced  models. 

c.  Without  additional  information,  can  you  tell  what  the  quantity  dfR  —  dfF  in 
test  statistic  (3.68)  will  equal  in  the  analyst’s  test?  Explain. 

3.34.  Refer  to  Grade  point  average  Problem  2.15. 

a.  Would  it  be  more  reasonable  to  consider  the  as  known  constants  or  as 
random  variables  here?  Explain. 

b.  If  the  X,  were  considered  to  be  random  variables,  would  this  have  any  effect 
on  prediction  intervals  for  new  applicants?  Explain. 

3.35.  Refer  to  Calculator  maintenance  Problems  2.16  and  3.5.  How  would  the  mean¬ 
ing  of  the  confidence  coefficient  in  Problem  3.5a  change  if  the  independent  varia¬ 
ble  were  considered  a  random  variable  and  the  conditions  in  (3.40)  were  applica¬ 
ble? 


EXERCISES 


3.36.  Show  that  bQ  as  defined  in  (3.19)  is  an  unbiased  estimator  of  /30. 


Projects  /  1 07 


3.37.  Derive  the  expression  in  (3.20b)  for  the  variance  of  b0,  making  use  of  theorem 
(3.29).  Also  explain  how  variance  (3.20b)  is  a  special  case  of  variance  (3.28b). 

3.38.  (Calculus  needed.) 

a.  Obtain  the  likelihood  function  for  the  sample  observations  Yi, ...  ,Yn  given 
Xi, ...  ,Xn,  if  the  conditions  in  (3.40)  apply. 

b.  Obtain  the  maximum  likelihood  estimators  of  /30,  (3\,  and  a2.  Are  the  esti¬ 
mators  of  /30  and  j3y  the  same  as  those  in  (2.27)  when  the  X,  are  fixed? 

3.39.  Suppose  that  the  normal  error  regression  model  (3. 1)  is  applicable  except  that  the 
error  variance  is  not  constant;  rather  the  variance  is  larger,  the  larger  is  X.  Does 
/3i  —  0  still  imply  that  there  is  no  linear  association  between  X  and  F?  That  there 
is  no  association  between  X  and  F?  Explain. 

3.40.  Derive  the  expression  for  SSR  in  (3.50b). 

3.41.  In  a  small-scale  regression  study,  five  observations  on  F  were  obtained  corre¬ 
sponding  to  X  =  1,  4,  10,  11,  and  14.  Assume  that  cr  —  .6,  /3o  =  5,  and  /3y  =  3. 

a.  What  are  the  expected  values  of  MSR  and  MSE  here? 

b.  For  purposes  of  determining  whether  or  not  a  regression  relation  exists, 
would  it  have  been  better  or  worse  to  have  made  the  five  observations  at 
X  =  6,  7,  8,  9,  and  10?  Why?  Would  the  same  answer  apply  if  the  principal 
purpose  were  to  estimate  the  mean  response  for  X  =  8?  Discuss. 

3.42.  The  simple  linear  regression  model  (3.1)  is  assumed  to  be  applicable. 

a.  When  testing  H0:  f3\  =  5  versus  Ha:  f3y  X  5  by  means  of  a  general  linear 
test,  what  is  the  reduced  model?  dfR  1 

b.  When  testing  H0:  /30  =  2,  (3y  =  5  versus  Ha:  not  both  /30  —  2  and  /3i  =  5, 
what  is  the  reduced  model?  dfR  1 


PROJECTS 

3.43.  Refer  to  the  SMSA  data  set  and  Project  2.38.  Using  r 2  as  the  criterion,  which 
independent  variable  accounts  for  the  largest  reduction  in  the  variability  in  the 
number  of  active  physicians? 

3.44.  Refer  to  the  SMSA  data  set  and  Project  2.39.  Obtain  a  separate  interval  estimate 
of  for  each  region.  Use  a  90  percent  confidence  coefficient  in  each  case.  Do  the 
regression  lines  for  the  different  regions  appear  to  have  similar  slopes? 

3.45.  Refer  to  the  SEN1C  data  set  and  Project  2.40.  Using  r2  as  the  criterion,  which 
independent  variable  accounts  for  the  largest  reduction  in  the  variability  of  the 
average  length  of  stay? 

3.46.  Refer  to  the  SENIC  data  set  and  Project  2.41.  Obtain  a  separate  interval  estimate 
of  fii  for  each  region.  Use  a  95  percent  confidence  coefficient  in  each  case.  Do  the 
regression  lines  for  the  different  regions  appear  to  have  similar  slopes? 

3.47.  Five  observations  on  F  are  to  be  taken  when  X  =  4,  8,  12,  16,  and  20,  respec¬ 
tively.  The  true  regression  function  is  E(Y)  -  20  +  AX,  and  the  are  independ¬ 
ent  A  (0,  25). 

a.  Generate  five  normal  random  numbers,  with  mean  0  and  variance  25.  Con¬ 
sider  these  random  numbers  as  the  error  terms  for  the  five  observations  at 
X  =  4,  8,  12,  16,  and  20,  and  calculate  Yy,  F2,  F3,  F4,  and  F5.  Obtain  the 


108  /  Inferences  in  regression  analysis 


least  squares  estimates  b0  and  bx  when  fitting  a  straight  line  to  the  five  obser¬ 
vations.  Also  calculate  Yh  when  Xh  —  10. 

b.  Repeat  part  (a)  200  times,  generating  new  random  numbers  each  time. 

c.  Make  a  frequency  distribution  of  the  200  estimates  bx.  Calculate  the  mean 
and  standard  deviation  of  the  200  estimates  bx .  Are  the  results  consistent 
with  theoretical  expectations? 

d.  For  each  of  the  200  replications,  calculate  a  95  percent  confidence  interval  for 
E{Yh)  when  Xh=  10.  What  proportion  of  the  200  confidence  intervals  in¬ 
clude  E(Yh)l  Is  this  result  consistent  with  theoretical  expectations? 

CITED  REFERENCE 

3.1  Nie,  N.  H.;  C.  H.  Hull;  J.  G.  Jenkins;  K.  Steinbrenner;  and  D.  H.  Bent.  SPSS: 

Statistical  Package  for  the  Social  Sciences.  2d  ed.  New  York:  McGraw-Hill,  1975. 


4 


Aptness  of  model  and 
remedial  measures 


When  a  regression  model,  such  as  the  simple  linear  regression  model  (3. 1),  is 
selected  for  an  application,  one  can  usually  not  be  certain  in  advance  that  the 
model  is  appropriate  for  that  application.  Any  one,  or  several,  of  the  features  of 
the  model,  such  as  linearity  of  the  regression  function  or  normality  of  the  error 
terms,  may  not  be  appropriate  for  the  particular  data  at  hand.  Hence,  it  is  impor¬ 
tant  to  examine  the  aptness  of  the  model  for  the  data  before  further  analysis  based 
on  that  model  is  undertaken.  In  this  chapter,  we  discuss  some  simple  graphic 
methods  for  studying  the  aptness  of  a  model,  as  well  as  some  formal  statistical 
tests  for  doing  so.  We  conclude  with  a  consideration  of  some  techniques  whereby 
the  simple  regression  model  (3.1)  can  be  made  appropriate  when  the  data  do  not 
accord  with  the  conditions  of  the  model. 

While  the  discussion  in  this  chapter  is  in  terms  of  the  aptness  of  the  simple 
regression  model  (3.1),  the  basic  principles  apply  to  all  statistical  models  dis¬ 
cussed  in  this  book.  In  later  chapters,  additional  material  concerning  the  aptness 
of  the  model  and  remedial  measures  will  be  presented. 


4.1  RESIDUALS 

A  residual  e(,  as  defined  in  (2.16),  is  the  difference  between  the  observed 
value  and  the  fitted  value: 

(4.1)  et  =  Yt  —  Yj 

109 


110  /  Aptness  of  model  and  remedial  measures 


As  such,  it  may  be  regarded  as  the  observed  error,  in  distinction  to  the  unknown 
true  error  et  in  the  regression  model: 

(4.2)  e,  =  %-  E(Yi) 

For  regression  model  (3.1),  the  £,  are  assumed  to  be  independent  normal 
random  variables,  with  mean  0  and  constant  variance  cr2.  If  the  model  is  appro¬ 
priate  for  the  data  at  hand,  the  observed  residuals  et  should  then  reflect  the 
properties  assumed  for  the  et.  This  is  the  basic  idea  underlying  residual  analysis, 
a  highly  useful  means  of  examining  the  aptness  of  a  model. 


Properties  of  residuals 


The  mean  of  the  n  residuals  et  is  by  (2.17): 


(4.3) 


n 


where  e  denotes  the  mean  of  the  residuals.  Thus,  since  e  is  always  0,  it  provides 
no  information  as  to  whether  the  true  errors  £,-  have  expected  value  E(ei)  =  0. 
The  variance  of  the  n  residuals  <?,-  is  defined  as  follows  for  model  (3.1): 


(4.4) 


Ste  -  if  Ze? 

n  —  2  n  —  2 


SSE 
n  —  2 


=  MSE 


If  the  model  is  appropriate,  MSE  is,  as  noted  earlier,  an  unbiased  estimator  of  the 
variance  of  the  error  terms  cr2. 


Standardized  residuals 

For  analytical  convenience,  standardized  residuals  are  used  at  times  in  resid¬ 
ual  analysis.  Since  the  standard  deviation  of  the  error  terms  et  is  cr,  which  is 
estimated  by  VMSE,  we  shall  define  here  the  standardized  residual  as  follows: 


Vmse  Vmse 

We  shall  explain  residual  analysis  mainly  in  terms  of  the  residuals  et,  but  occa¬ 
sionally  will  employ  the  standardized  residuals. 


Nonindependence  of  residuals 

The  residuals  e(  are  not  independent  random  variables  because  they  involve 
the  fitted  values  Yt  which  are  based  on  the  sample  estimates  b0  and  bx.  Thus,  the 
residuals  are  associated  with  only  n  —  2  degrees  of  freedom.  As  a  result,  we 
know  from  (2.17)  that  the  sum  of  the  e{  must  be  0  and  from  (2.19)  that  the 
products  must  sum  to  0.  The  same  lack  of  independence  holds  for  the  stand¬ 
ardized  residuals. 


4.2  Graphic  analysis  of  residuals  /  111 


When  the  sample  size  is  large  in  comparison  to  the  number  of  parameters  in 
the  regression  model,  the  dependency  effect  among  the  et  is  relatively  unimpor¬ 
tant  and  can  be  ignored  for  most  purposes. 

Departures  from  model  to  be  studied  by  residuals 

We  shall  consider  the  use  of  residuals  for  examining  six  important  types  of 
departures  from  model  (3.1),  the  simple  linear  regression  model  with  normal 
errors: 

1.  The  regression  function  is  not  linear. 

2.  The  error  terms  do  not  have  constant  variance. 

3.  The  error  terms  are  not  independent. 

4.  The  model  fits  all  but  one  or  a  few  outlier  observations. 

5.  The  error  terms  are  not  normally  distributed. 

6.  One  or  several  important  independent  variables  have  been  omitted  from  the 
model. 

4.2  GRAPHIC  ANALYSIS  OF  RESIDUALS 

We  take  up  now  some  informal  ways  in  which  graphs  of  residuals  can  be 
analyzed  to  provide  information  on  whether  any  of  the  six  types  of  departures 
from  the  simple  linear  regression  model  (3.1)  just  mentioned  are  present. 

Nonlinearity  of  regression  function 

Whether  or  not  a  linear  regression  function  is  appropriate  for  the  data  being 
analyzed  can  often  be  studied  from  a  scatter  plot  of  the  data,  with  the  fitted 
regression  function  plotted  on  it.  Figure  4.1a  contains  the  data  and  the  fitted 
regression  line  for  a  study  of  the  relation  between  amount  of  transit  information 
and  bus  ridership  in  eight  comparable  test  cities,  where  X  is  the  number  of  bus 
transit  maps  distributed  free  to  residents  of  the  city  at  the  beginning  of  the  test 
period  and  Y  is  the  increase  during  the  test  period  in  average  daily  bus  ridership 
during  nonpeak  hours.  The  original  data  and  fitted  values  are  given  in  Table  4.1. 
The  graph  suggests  strongly  that  a  linear  regression  function  is  not  appropriate. 

Figure  4.1b  presents  for  this  same  example  the  residuals  e,  shown  in  Table 
4. 1 ,  plotted  against  the  independent  variable  X.  The  lack  of  fit  of  a  linear  regres¬ 
sion  function  is  also  strongly  suggested  by  the  residual  plot  against  X  in  Figure 
4.1b,  since  the  residuals  depart  from  0  in  a  systematic  fashion.  Note  that  they  are 
negative  for  smaller  X  values,  positive  for  medium  size  X  values,  and  negative 
again  for  large  X  values. 

In  this  case,  both  Figures  4.1a  and  4.  lb  are  effective  means  of  examining  the 
appropriateness  of  the  linearity  of  the  regression  function.  Figure  4.  lb,  the  resid¬ 
ual  plot,  in  general  has  some  advantages  over  Figure  4.  la,  the  scatter  plot.  First, 
the  residual  plot  can  easily  be  used  for  examining  other  facets  of  the  aptness  of 
the  model.  Second,  there  are  occasions  when  the  scaling  of  the  scatter  plot  places 


112  /  Aptness  of  model  and  remedial  measures 


FIGURE  4.1  Scatter  plot  and  residual  plot  for  transit  example  illustrating  nonlinear 
regression  function 


(a)  Scatter  Plot 


Maps  Distributed  (thousands) 


(b)  Residual  Plot 


2 1 


H 


o 

3 

35 

‘w 

<o 

oi 


•  • 


-H 


“2  J - * - 1 - 1 - 1 - 

100  140  180  220 

Maps  Distributed  (thousands) 


TABLE  4.1  Number  of  maps  distributed  and  increase  in  ridership — transit 
example 


City 

i 

Increase  in 
Ridership 
{thousands) 

Y, 

Maps 

Distributed 

{thousands) 

xt 

Fitted 

Value 

?. 

Residual 
Yi-ft  =  et 

1 

.60 

80 

1.66 

-1.06 

2 

6.70 

220 

7.75 

-  1.05 

3 

5.30 

140 

4.27 

+  1.03 

4 

4.00 

120 

3.40 

+  .60 

5 

6.55 

180 

6.01 

+  .54 

6 

2.15 

100 

2.53 

-.38 

7 

6.60 

200 

6.88 

-.28 

8 

5.75 

160 

5.14 

+  .61 

?=-1.82  +  .0435X 


the  Yt  observations  close  to  the  fitted  values  Yh  for  instance,  when  there  is  a  steep 
slope.  It  then  becomes  more  difficult  to  study  the  appropriateness  of  a  linear 
regression  function  from  the  scatter  plot.  A  residual  plot,  on  the  other  hand,  can 
clearly  show  any  systematic  pattern  in  the  deviations  around  the  regression  line 
under  these  conditions. 

Figure  4.2a  shows  a  prototype  situation  of  the  residual  plot  against  X  if  the 
linear  model  is  appropriate.  The  residuals  should  tend  to  fall  within  a  horizontal 
band  centered  around  0,  displaying  no  systematic  tendencies  to  be  positive  and 
negative. 

Figure  4.2b  shows  a  prototype  situation  of  a  departure  from  the  linear  regres¬ 
sion  model  indicating  the  need  for  a  curvilinear  regression  function.  Here  the 


4.2  Graphic  analysis  of  residuals  /  113 


FIGURE  4.2  Prototype  residual  plots 

(a)  (b) 


(c) 


(d) 


e 


e 


0 


0 


X 


Time 


residuals  tend  to  vary  in  a  systematic  fashion  between  being  positive  and  nega¬ 
tive.  A  different  type  of  departure  from  linearity  would,  of  course,  lead  to  a 
different  picture  than  the  prototype  pattern  in  Figure  4.2b. 

Nonconstancy  of  error  variance 

A  plot  of  the  residuals  against  the  independent  variable  is  not  only  helpful  to 
study  whether  a  linear  regression  function  is  appropriate  but  also  to  examine 
whether  the  variance  of  the  error  terms  is  constant.  For  instance,  Figure  4.3a 
shows  a  residual  plot  against  the  independent  variable  A  for  an  application  in¬ 
volving  the  regression  of  the  diastolic  blood  pressure  of  female  children  (F) 
against  their  age  (A).  The  plot  was  generated  by  the  BMDP  package  (Ref.  4.1). 
The  numerical  values  shown  in  the  graph  indicate  the  number  of  residuals  falling 
on  or  near  a  point.  We  have  added  the  flared  lines  to  highlight  the  tendency  that 
the  larger  A  is,  the  more  spread  out  are  the  residuals.  This  suggests  that  the  error 
variance  is  larger  for  older  children  than  for  younger  ones. 

Figure  4.2c  shows  a  prototype  picture  of  a  residual  plot  when  the  error  vari¬ 
ance  increases  with  A.  In  many  business,  social  science,  and  biological  science 
applications,  departures  from  constancy  of  the  error  variance  tend  to  be  of  the 
trapezoidal  type  shown  in  Figure  4.2c.  One  can  also  encounter  error  variances 


114 


/  Aptness  of  model  and  remedial  measures 


FIGURE  4.3  Residual  plots  for  blood  pressure  example  illustrating  nonconstant  error 
variance  (BMDP2R,  Ref.  4.1) 


(a)  Residuals  Plotted  against  X 


(b)  Residuals  Plotted  against  Y 


AGE  PREDICTD 


decreasing  with  increasing  levels  of  the  independent  variable  or  varying  in  some 
other  fashion. 

A  residual  plot  against  the  fitted  values  Y  is  also  an  effective  means  of  study¬ 
ing  the  constancy  of  the  error  variance,  particularly  when  the  regression  function 
is  not  linear  or  when  a  multiple  regression  model  is  employed.  Figure  4.3b 
shows,  for  the  same  data  as  in  Figure  4.3a,  a  plot  of  the  residuals  e,  against  the 
fitted  values  Yt  generated  by  the  BMDP  package.  Note  that  the  horizontal  axis  is 
labeled  PREDICTD  which  stands  for  ‘  ‘predicted,”  an  alternative  term  often  used 
for  “fitted”  value.  Again  we  see  the  prototype  pattern  of  Figure  4.2c,  suggesting 
here  that  the  error  variance  increases  with  Y.  Since  the  relation  between  blood 
pressure  and  age  is  a  positive  one,  Figure  4.3b  also  indicates  that  the  error 
variance  increases  with  X. 

Presence  of  outliers 

Outliers  are  extreme  observations.  In  a  residual  plot,  they  are  points  that  lie 
far  beyond  the  scatter  of  the  remaining  residuals,  perhaps  four  or  more  standard 
deviations  from  zero.  The  residual  plot  in  Figure  4.4  presents  standardized  resid¬ 
uals  and  contains  one  outlier,  which  is  circled.  Note  that  this  residual  represents 
an  observation  almost  six  standard  deviations  from  the  fitted  value. 

Outliers  can  create  great  difficulty.  When  we  encounter  one,  our  first  suspi¬ 
cion  is  that  the  observation  resulted  from  a  mistake  or  other  extraneous  effect, 
and  hence  should  be  discarded.  A  major  reason  for  discarding  it  is  that  under  the 
least  squares  method,  a  fitted  line  may  be  pulled  disproportionately  toward  an 
outlying  observation  because  the  sum  of  the  squared  deviations  is  minimized. 
This  could  cause  a  misleading  fit  if  indeed  the  outlier  observation  resulted  from  a 


4.2  Graphic  analysis  of  residuals 


/ 


115 


FIGURE  4.4  Residual  plot  with  outlier 


Standardized 

Residual 


mistake  or  other  extraneous  cause.  On  the  other  hand,  outliers  may  convey 
significant  information,  as  when  an  outlier  occurs  because  of  an  interaction  with 
another  independent  variable  omitted  from  the  model.  A  safe  rule  frequently 
suggested  is  to  discard  an  outlier  only  if  there  is  direct  evidence  that  it  represents 
an  error  in  recording,  a  miscalculation,  a  malfunctioning  of  equipment,  or  a 
similar  type  of  circumstance. 

Note 

When  a  linear  regression  model  is  fitted  to  a  data  set  with  a  small  number  of  observa¬ 
tions  and  an  outlier  is  present,  the  fitted  regression  may  be  so  distorted  by  the  outlier  that 
the  residual  plot  suggests  a  lack  of  fit  of  the  linear  regression  model  in  addition  to  flagging 
the  outlier.  Figure  4.5  illustrates  this  situation.  The  scatter  plot  in  Figure  4.5a  presents  a 
case  where  all  observations  except  the  outlier  fall  around  a  straight-line  statistical  relation¬ 
ship.  When  a  linear  regression  function  is  fitted  to  these  data,  the  outlier  causes  such  a 
shift  in  the  fitted  regression  line  as  to  lead  to  a  systematic  pattern  of  deviations  from  the 
fitted  line  for  the  other  observations,  as  evidenced  by  the  residual  plot  in  Figure  4.5b. 


Nonindependence  of  error  terms 

Whenever  data  are  obtained  in  a  time  sequence,  it  is  a  good  idea  to  plot  the 
residuals  against  time,  even  though  time  has  not  been  explicitly  incorporated  as  a 
variable  into  the  model.  The  purpose  is  to  see  if  there  is  any  correlation  between 
the  error  terms  over  time.  In  an  experiment  to  study  the  relation  between  the 


116  /  Aptness  of  model  and  remedial  measures 


FIGURE  4.5  Distorting  effect  on  residuals  caused  by  an  outlier  when  remaining  data 
follow  linear  regression 


(a)  Scatter  Plot 

55—| 

45- 

>-  35- 


1 5  -1 - 1 - 1 - 1 - 1  i  i 

2  4  6  8  10  12 

X 


(b)  Residual  Plot 


T 

2 


T 

4 


T 

6 


X 


T 

8 


12 


diameter  of  a  weld  (X)  and  the  shear  strength  of  the  weld  (Y),  the  residual  plot 
against  A  as  shown  in  Figure  4.6a  appears  to  indicate  no  departures  from  the 
simple  regression  model,  either  with  respect  to  linearity  or  constancy  of  error 
variance.  When  the  residuals  are  plotted  in  the  time  order  in  which  the  welds 
were  made  in  Figure  4.6b,  however,  an  evident  correlation  between  the  error 
terms  stands  out.  Negative  residuals  are  associated  mainly  with  the  early  trials, 
and  positive  residuals  with  the  later  trials.  Apparently,  some  effect  connected 
with  time  was  present,  such  as  learning  by  the  welder  or  a  gradual  change  in  the 
welding  equipment,  so  that  the  shear  strength  tended  to  be  greater  in  the  later 
welds  on  account  of  this  effect. 

A  prototype  of  a  time-related  effect  is  shown  in  Figure  4. 2d,  which  portrays  a 
linear  time-related  effect.  It  is  sometimes  useful  to  view  the  problem  of  non¬ 
independence  of  the  error  terms  as  one  in  which  an  important  variable  (in  this 
case,  time)  has  been  omitted  from  the  model.  We  shall  discuss  this  type  of 
problem  shortly. 

When  the  error  terms  are  independent,  we  would  expect  the  residuals  to  fluc¬ 
tuate  in  a  more  or  less  random  pattern  around  the  base  line  0,  such  as  the  scatter¬ 
ing  shown  in  Figure  4.7.  Lack  of  randomness  can  take  the  form  of  too  much 
alternation  of  points  around  the  zero  line,  or  too  little  alternation.  In  practice, 
there  is  little  concern  with  the  former  case  except  in  situations  where  the  error 
term  is  subject  to  periodic  or  offsetting  effects  in  observations  at  successive 
levels  of  A.  Too  little  alternation,  in  contrast,  frequently  is  a  matter  of  concern,  as 
in  the  welding  example  in  Figure  4.6b. 

Note 

When  the  residuals  are  plotted  against  A,  as  in  Figure  4.  lb,  the  scatter  may  not  appear 
to  be  random.  For  this  plot,  however,  the  basic  problem  is  probably  not  lack  of  indepen¬ 
dence  of  the  error  terms  but  rather  a  poorly  fitting  regression  function.  This,  indeed,  is  the 
situation  portrayed  in  Figure  4.1a. 


118  /  Aptness  of  model  and  remedial  measures 


FIGURE  4.7  Residual  plot  against  time  suggesting  independence  of  error  terms 

Residual 

e\ 


J _ 1 _ I _ 1 _ I _ I _ I - L 

1  2  3  4  5  6  7  8 

Time 


I  i _ l - 1 - 1 — 

9  10  11  12  13 


Nonnormality  of  error  terms 

As  we  noted  earlier,  small  departures  from  normality  do  not  create  any  serious 
problems.  Major  departures,  on  the  other  hand,  should  be  of  concern.  The  nor¬ 
mality  of  the  error  terms  can  be  studied  informally  by  examining  the  residuals  in 
a  variety  of  graphic  ways.  One  can  construct  a  histogram  of  the  residuals  and  see 
if  gross  departures  from  normality  are  shown  by  it.  Another  possibility  is  to 
determine  whether,  say,  about  68  percent  of  the  standardized  residuals  e-JV^MSE 
fall  between  —  1  and  + 1 ,  or  about  90  percent  between  —1.64  and  +1.64.  (If  the 
sample  size  is  small,  the  corresponding  t  values  would  be  used.) 

Still  another  possibility  is  to  prepare  a  normal  probability  plot  of  the  residuals. 
Here  the  residuals  are  plotted  against  their  expected  values  when  the  distribution 
is  normal.  A  plot  which  is  nearly  linear  suggests  agreement  with  normality 
whereas  a  plot  which  departs  substantially  from  linearity  suggests  that  the  error 
distribution  is  not  normal. 

Table  4.2,  column  1,  contains  the  residuals,  in  ascending  order,  for  a  regres¬ 
sion  study  of  per  capita  library  usage  (F)  and  size  of  city  (X)  in  10  cities.  To  find 
the  expected  values  of  the  ordered  residuals  under  normality,  we  utilize  the  facts 
that  (1)  the  expected  value  of  the  error  terms  for  regression  model  (3.1)  is  zero 
and  (2)  that  the  standard  deviation  of  the  error  terms  is  estimated  by  V'MSE. 
Statistical  theory  states  that  for  a  normal  random  variable  with  mean  0  and  esti¬ 
mated  standard  deviation  V'MSE,  the  expected  value  of  the  zth  smallest  observa¬ 
tion  in  a  random  sample  of  n  is  given  approximately  by  the  following  expression: 


4.2  Graphic  analysis  of  residuals  /  119 


(4.6) 


Vmse 


i  -  .375 
n  +  .25 


where  z(A)  as  usual  denotes  the  (A)  100  percentile  of  the  standard  normal  distri¬ 
bution. 


TABLE  4.2  Residuals  and  expected  values  under  normality  for 
library  usage  example 


Ascending 

Order 

i 

(1) 

Ordered 

Residual 

e, 

(2) 

Expected 
Value  under 
Normality 

1 

-1.33 

-1.08 

2 

-.52 

-.70 

3 

-.27 

-.46 

4 

-.19 

-.26 

5 

-.09 

-.08 

6 

.09 

.08 

7 

.18 

.26 

8 

.44 

.46 

9 

.66 

.70 

10 

1.03 

1.08 

Squaring  the  residuals  in  Table  4.2,  summing,  and  dividing  by  n  —  2  we 
obtain  MSE  =  .4859,  so  V'MSE  =  .691.  For  the  smallest  residual,  we  have 
i=l.  Hence,  <7  -  315)/(n  +  .25)  =  (1  -  .375)/(10  +  .25)  =  .061,  and  the 
expected  value  of  the  smallest  residual  under  normality  is: 

.697[z(.061)]  =  .697(— 1.55)  =  -1.08 

Similarly,  the  expected  value  of  the  second  smallest  residual  under  normality 
is  obtained  by  finding,  for  i  =  2,  (i  —  .31 5)/ (n  +  .25)  =  (2  —  .375)7(10  +  .25) 
=  .159  so  that: 


.697[z(.159)]  =  ,697(— 1.00)  =  -.70 

Because  of  the  symmetry  of  a  normal  probability  distribution,  the  expected 
values  of  the  largest  and  second  largest  residuals  are  1.08  and  .70,  respectively. 

Table  4.2,  column  2,  contains  all  10  expected  values  under  the  assumption  of 
normality.  Figure  4.8  presents  a  plot  of  the  residuals  against  their  expected  val¬ 
ues  under  normality.  This  plot  is  called  a  normal  probability  plot.  Note  that  the 
points  in  Figure  4.8  fall  reasonably  close  to  a  straight  line,  suggesting  that  the 
error  terms  are  approximately  normally  distributed. 

Many  computer  packages  will  prepare  normal  probability  plots  at  the  option 
of  the  user.  Some  of  these  plots  utilize  standardized  residuals,  but  this  does  not 
affect  the  basic  nature  of  the  plot. 


120  /  Aptness  of  model  and  remedial  measures 


FIGURE  4.8  Normal  probability  plot  of  residuals — library  usage 
example 


2  -j 

o 

=5 

T7 

1  - 

‘co 

0 

0- 

~o 

0 

0 

-1  - 

~o 

O 

I - 1 - r~ 

-2  -1  0 

Expected 


- ! - , 

1  2 

Value 


One  method  of  assessing  the  linearity  of  the  normal  probability  plot  is  to 
calculate  the  coefficient  of  correlation  (3.73)  relating  the  residuals  e,  to  their 
expected  values  under  normality.  A  high  value  of  the  correlation  coefficient,  say, 
.90  or  more,  is  indicative  of  normality.  In  our  library  usage  example  in  Table 
4.2,  the  coefficient  of  correlation  is  .981,  supporting  the  conclusion  of  approxi¬ 
mate  normality  for  the  error  terms. 

The  analysis  for  model  departures  with  respect  to  normality  is,  in  many  re¬ 
spects,  more  difficult  than  that  for  other  types  of  departures.  In  the  first  place, 
random  variation  can  be  particularly  mischievous  when  one  studies  the  nature  of 
a  probability  distribution  unless  the  sample  size  is  quite  large.  Even  worse,  other 
types  of  departures  can  and  do  affect  the  distribution  of  the  residuals.  For  in¬ 
stance,  residuals  may  appear  to  be  not  normally  distributed  because  an  inappro¬ 
priate  regression  function  is  used  or  because  the  error  variance  is  not  constant. 
Hence,  it  is  usually  a  good  strategy  to  investigate  these  other  types  of  departures 
first,  before  concerning  oneself  with  the  normality  of  the  error  terms. 


Omission  of  important  independent  variables 

Residuals  should  be  plotted  against  variables  omitted  from  the  model  that 
might  have  important  effects  on  the  response,  data  being  available.  The  time 
variable  cited  earlier  in  the  welding  application  is  an  example.  The  purpose  of 
this  additional  analysis  is  to  determine  whether  there  are  any  other  key  independ¬ 
ent  variables  that  could  provide  important  additional  descriptive  and  predictive 
power  to  the  model. 

As  another  example,  in  a  study  to  predict  output  by  piece  rate  workers  in  an 
assembling  operation,  the  relation  between  output  (F)  and  age  (X)  of  worker  was 


122  /  Aptness  of  model  and  remedial  measures 


Figures  4.9b  and  c.  Note  that  the  residuals  for  machines  made  by  Company  A 
tend  to  be  positive,  while  those  for  machines  made  by  Company  B  tend  to  be 
negative.  Thus,  type  of  machine  appears  to  have  a  definite  effect  on  productivity, 
and  output  predictions  may  turn  out  to  be  far  superior  when  this  independent 
variable  is  added  to  the  model.  While  this  example  dealt  with  a  classification 
variable  (type  of  machine),  the  residual  analysis  for  an  additional  quantitative 
variable  is  completely  analogous.  One  would  simply  plot  the  residuals  against 
the  additional  independent  variable  and  see  whether  or  not  the  residuals  tend  to 
vary  systematically  with  the  level  of  the  additional  independent  variable. 

Note 

We  do  not  say  that  the  original  model  is  “wrong”  when  it  can  be  improved  materially 
by  adding  one  or  more  independent  variables.  Only  a  few  of  the  factors  operating  on  any 
dependent  variable  Y  in  real-world  situations  can  be  included  explicitly  in  a  regression 
model.  The  chief  purpose  of  residual  analysis  in  identifying  other  important  independent 
variables  is  therefore  to  test  the  adequacy  of  the  model  and  see  whether  it  could  be 
improved  materially  by  adding  one  or  a  few  independent  variables. 


Comments 

1.  We  discussed  the  model  departures  one  at  a  time.  In  actuality,  several  types  of 
departures  may  occur  together.  For  instance,  a  linear  regression  function  may  be  a  poor  fit 
and  the  variance  of  the  error  terms  may  not  be  constant.  In  these  cases,  the  prototype 
patterns  of  Figure  4.2  can  still  be  useful,  but  they  would  need  to  be  combined  into 
composite  patterns. 

2.  While  graphic  analysis  of  residuals  is  only  an  informal  method  of  analysis,  in  many 
cases  it  suffices  for  examining  the  aptness  of  a  model. 

3.  The  basic  approach  to  residual  analysis  applies  not  only  to  simple  linear  regression 
but  also  to  more  complex  regression  and  other  types  of  statistical  models. 

4.  Most  of  the  routine  work  in  residual  analysis  can  be  handled  on  computers.  Almost 
all  regression  programs  supply  the  fitted  values  and  corresponding  residuals,  and  routines 
are  generally  available  whereby  the  various  types  of  residual  plots  can  be  obtained  option¬ 
ally  on  printout. 


4.3  TESTS  INVOLVING  RESIDUALS 

Graphic  analysis  of  residuals  is  inherently  subjective.  Nevertheless,  subjec¬ 
tive  analysis  of  a  variety  of  interrelated  residual  plots  will  frequently  reveal 
difficulties  in  the  model  more  clearly  than  particular  tests.  There  are  occasions, 
however,  when  one  wishes  to  put  specific  questions  to  a  test.  We  now  review 
some  of  the  relevant  tests  briefly,  and  take  up  one  new  type  of  test. 

Most  statistical  tests  require  independent  observations.  As  we  have  seen,  how¬ 
ever,  the  residuals  are  dependent.  Fortunately,  the  dependency  becomes  quite 
small  for  large  samples,  so  that  one  can  usually  then  ignore  it. 


4.4  F  test  for  lack  of  fit  /  123 


Tests  for  randomness 

The  runs  test  is  frequently  used  to  test  for  lack  of  randomness  in  the  residuals 
arranged  in  time  order.  Another  test,  specifically  designed  for  lack  of  random¬ 
ness  in  least  squares  residuals,  is  the  Durbin- Watson  test.  This  test  will  be  dis¬ 
cussed  in  Chapter  13. 

Tests  for  constancy  of  variance 

When  a  residual  plot  gives  the  impression  that  the  variance  may  be  increasing 
or  decreasing  in  a  systematic  manner  related  to  X  or  E(Y),  a  simple  test  is  to  fit 
separate  regression  functions  to  each  half  of  the  observations  arranged  by  level  of 
X,  calculate  error  mean  squares  for  each,  and  test  for  equality  of  the  error  vari¬ 
ances  by  an  F  test.  Another  simple  test  is  by  means  of  rank  correlation  between 
the  absolute  value  of  the  residual  and  the  value  of  the  independent  variable. 

Tests  for  outliers 

A  simple  test  for  an  outlier  observation  involves  fitting  a  new  regression  line 
to  the  other  n  —  1  observations.  The  suspect  observation,  which  was  not  used  in 
fitting  the  new  line,  can  now  be  regarded  as  a  new  observation.  One  can  calculate 
the  probability  that  in  n  observations,  a  deviation  from  the  fitted  line  as  great  as 
that  of  the  outlier  will  be  obtained  by  chance.  If  this  probability  is  sufficiently 
small,  the  outlier  can  be  rejected  as  not  having  come  from  the  same  population  as 
the  other  n  —  1  observations.  Otherwise,  the  outlier  is  retained. 

Many  other  tests  to  aid  in  evaluating  outliers  have  been  developed.  These  are 
discussed  in  specialized  references  such  as  Reference  4.2  and  in  statistical  jour¬ 
nals. 

Tests  for  normality 

Goodness  of  fit  tests  can  be  used  for  examining  the  normality  of  the  error 
terms.  For  instance,  the  chi-square  test  or  the  Kolmogorov-Smimov  test  can  be 
employed  for  testing  the  normality  of  the  error  terms  by  analyzing  the  residuals. 

Note 

The  runs  test,  rank  correlation,  and  goodness  of  fit  tests,  mentioned  above,  are  com¬ 
monly  used  statistical  procedures  which  are  discussed  in  many  basic  statistics  texts. 

4.4  F  TEST  FOR  LACK  OF  FIT 

We  now  take  up  a  formal  test  for  determining  whether  or  not  a  specified 
regression  function  adequately  fits  the  data.  This  lack  of  fit  test  assumes  that  the 
observations  Y  for  given  X  are  (1)  independent  and  (2)  normally  distributed,  and 
(3)  the  distributions  of  Y  have  the  same  variance  cr2.  We  illustrate  this  test  for 
ascertaining  whether  or  not  a  linear  regression  function  is  a  good  fit  for  the  data. 


124  /  Aptness  of  model  and  remedial  measures 


Replications 

The  lack  of  fit  test  requires  repeat  observations  at  one  or  more  X  levels.  In 
nonexperimental  data,  these  may  occur  fortuitously,  as  when  in  a  productivity 
study  relating  workers’  output  and  age,  several  workers  of  the  same  age  happen 
to  be  included  in  the  study.  In  an  experiment,  one  can  assure  by  design  that  there 
are  repeat  observations.  For  instance,  in  an  experiment  on  the  effect  of  size  of 
salesperson  bonus  on  sales,  three  salespersons  can  be  offered  a  particular  size  of 
bonus,  for  each  of  six  bonus  sizes,  and  their  sales  then  observed. 

Repeated  trials  for  the  same  level  of  the  independent  variable,  of  the  type 
described,  are  called  replications .  The  resulting  observations  are  called  repli¬ 
cates. 

Example 

In  an  experiment  involving  12  similar  but  scattered  suburban  branch  offices  of 
a  commercial  bank,  holders  of  checking  accounts  at  the  offices  were  offered  gifts 
for  setting  up  savings  accounts  at  these  same  offices.  The  initial  deposit  in  the 
new  savings  account  had  to  be  for  a  specified  minimum  amount  to  qualify  for  the 
gift.  The  value  of  the  gift  was  directly  proportional  to  the  specified  minimum 
deposit.  Various  levels  of  minimum  deposit  and  related  gift  values  were  used  in 
the  experiment  in  order  to  ascertain  the  relation  between  the  specified  minimum 
deposit  and  gift  value  on  the  one  hand  and  number  of  accounts  opened  at  the 
office  on  the  other.  Altogether,  six  levels  of  minimum  deposit  and  proportional 
gift  value  were  used,  with  two  of  the  branch  offices  assigned  at  random  to  each 
level.  One  branch  office  had  a  fire  during  the  period  and  was  dropped  from  the 
study.  Table  4.3  contains  the  results,  where  X  is  the  amount  of  minimum  deposit 
and  Y  is  the  number  of  new  savings  accounts  that  were  opened  and  qualified  for 
the  gift  during  the  test  period. 


TABLE  4.3  Data  for  bank  example 


Observation 

i 

Size  of 
Minimum 
Deposit 
{dollars) 

Xt 

Number 
of  New 
Accounts 

Ti 

Observation 

i 

Size  of 
Minimum 
Deposit 
{dollars) 

X , 

Number 
of  New 
Accounts 
Yt 

1 

125 

160 

7 

75 

42 

2 

100 

112 

8 

175 

124 

3 

200 

124 

9 

125 

150 

4 

75 

28 

10 

200 

104 

5 

150 

152 

11 

100 

136 

6 

175 

156 

4.4  F  test  for  lack  of  fit  /  1 25 


A  linear  regression  function  was  fitted  in  the  usual  fashion;  it  is  (calculations 
not  shown): 

Y  =  50.72251  +  .48670X 

The  analysis  of  variance  table  also  was  obtained  and  is  shown  in  Table  4.4.  A 
scatter  plot,  together  with  the  fitted  regression  line,  is  shown  in  Figure  4.10.  The 
indications  are  strong  that  a  linear  regression  function  is  inappropriate.  To  test 
this  formally,  we  need  to  perform  a  decomposition  of  the  error  sum  of  squares 
SSE  in  Table  4.4  into  two  components  called  the  pure  error  and  lack  of  fit  compo¬ 
nents. 


TABLE  4.4  ANOVA  table  for  bank  example 


Source  of 

Variation  SS  df  MS 


Regression  SSR  =  5,141.3  1  MSR  =  5,141.3 

Error  55£=  14,741.6  9  MSE  =  1,638.0 


Total  SSTO  =  19,882.9  10 


FIGURE  4.10  Scatter  plot  and  fitted  regression  line — bank  example 


Decomposition  of  SSE 

Pure  error  component.  The  basic  idea  for  the  first  component  of  SSE  rests 
on  the  fact  that  there  are  replications  at  some  levels  of  X.  Let  us  denote  the 


126  /  Aptness  of  model  and  remedial  measures 


different X  levels  in  the  study,  whether  or  not  replicated  observations  are  present, 
as  X\, . . .  ,XC.  For  our  example,  c  =  6  since  there  are  six  minimum  deposit  size 
levels  in  the  study,  for  five  of  which  there  are  two  observations  and  for  one  there 
is  a  single  observation.  We  shall  let  X\  =  75  (the  smallest  minimum  deposit 
level),  X2  —  100, . . .  ,X6  =  200.  Further,  we  shall  denote  the  number  of  obser¬ 
vations  for  the  jth  level  of  X  as  n f,  for  our  example,  iii  =  n2  =  n3  =  n5  =  n6  =  2 
and  n4  =  1.  Thus,  the  total  number  of  observations  n  is  given  by: 

(4.7)  n=^nj 

7=1 

If  we  make  no  assumption  about  the  nature  of  the  regression  function  but 
assume  all  other  elements  of  model  (3.1),  we  can  still  estimate  the  error  variance 
cr2  because  of  the  repeated  observations.  Table  4.5  presents  the  same  data  as 
Table  4.3,  but  in  a  different  arrangement.  Table  4.5  also  shows  the  mean  of  the  Y 
observations  for  each  minimum  deposit  size.  We  shall  denote  the  mean  of  the  Y 
observations  when  X  =  Xj  by  Yj.  Thus  Yi  =  35  for  the  two  branches  with  mini¬ 
mum  deposits  of  Xi  =  $75,  and  so  on. 

Since  the  two  Y  observations  for  X4  =  $75  come  from  the  same  probability 
distribution,  we  can  estimate  the  variance  of  this  distribution  by  calculating  the 
usual  sample  variance,  using  the  deviations  around  5^!  =  35: 

(28  -  35)2  +  (42  -  35)2  _ 

2-1 

Likewise,  the  two  observations  for  X2  —  $100  come  from  the  same  probabil¬ 
ity  distribution,  so  that  we  can  estimate  the  variance  of  this  distribution  by  calcu¬ 
lating  the  sample  variance: 

(112  -  124)2  +  (136  -  124)2 

- —  288 

2  -  1 

Similarly,  we  can  estimate  the  variance  of  each  of  the  other  distributions 
except  for  the  one  at  X4  =  150  where  there  is  only  a  single  observation.  Since 
model  (3.1)  assumes  that  all  probability  distributions  of  Y  have  the  same  variance 
cr2,  we  can  combine  the  results  for  each  of  the  X  levels.  The  optimum  way  of 
combining  is  to  add  the  numerators: 

(28  -  35)2  +  (42  -  35)2  +  (112  -  124)2  +  (136  -  124)2 

+  (160  -  155)2  +  (150  -  155)2  +  (156  -  140)2  +  (124  -  140)2 
+  (124  -  114)2  +  (104  -  114)2  =  1,148 

then  add  the  denominators: 

1+1+1+1+1=5 

and  finally  take  the  ratio: 

1,148 


5 


=  229.6 


4.4  F  test  for  lack  of  fit  /  1 27 


TABLE  4.5  Data  for  bank  example,  arranged  by  observation  number  and  minimum 
deposit 


Observation 

Size  of  Minimum  Deposit  ( dollars ) 

Xi  =  75 

o 

o 

II 

.  <N 

X 

X3  =  125 

o 

to 

II 

>< 

=  175 

V6  =  200 

i  =  1 

28 

112 

160 

152 

156 

124 

i  =  2 

42 

136 

150 

124 

104 

Mean  Yj 

35 

124 

155 

152 

140 

114 

To  generalize,  let  us  denote  the  z'th  observation  for  the  jth  level  of  X  by  Yy, 
where  i—  1, . . .  ,nf,  j  =  1 ,c.  For  our  example  (Table  4.5),  Yu  =  28,  Y2\ 
=  42,  Y12  =  112,  and  so  on.  First,  we  calculate  the  sum  of  squares  of  the  devia¬ 
tions  from  the  mean  at  any  given  level  of  X.  For  X  =  Xj,  this  sum  of  squares  is: 

nj 

(4.8)  2  (Y„  -  Yj)2 

1=1 


We  then  add  these  sums  of  squares  over  all  levels  of  X  and  denote  this  sum  of  the 
sums  of  squares  by  SSPE: 

C  nj 

(4.9)  SSPE  =22  QV  “  W 

j=  1  i~  1 

Here  SSPE  stands  for  pure  error  sum  of  squares.  Note  that  when  there  is  only  a 
single  observation  at  Xn  we  have  Yy  =  Yj  so  that  Yy  —  Yj  =  0.  Hence,  such  Xj 
levels  do  not  contribute  to  the  pure  error  sum  of  squares,  as  was  illustrated  by  our 
example. 

The  degrees  of  freedom  associated  with  SSPE  are  n  —  c.  This  is  easy  to  see 
since  there  are  as  usual  nx  —  1  degrees  of  freedom  associated  with  the  sum  of 
squares  for  Xx,  n2—  1  degrees  of  freedom  with  the  sum  of  squares  for  X2,  and  so 
on.  The  sum  of  the  degrees  of  freedom  is: 


(4.10) 


C 

^  (jij  —  1)  =  'Zn.j  —  c  =  n  —  c 
j=  i 


Again,  we  see  that  Xj  levels  for  which  nj  =  1  do  not  contribute  to  the  degrees  of 
freedom  since  ty  —1=0  then. 

The  pure  error  mean  square  MSPE  is  given  by: 


(4.11) 


MSPE  = 


SSPE 
n  —  c 


The  reason  for  the  term  “pure  error”  is  that  MSPE  is  an-unbiased  estimator  of 
the  error  variance  a2  no  matter  what  is  the  nature  of  the  regression  function. 
MSPE  measures  the  variability  of  the  distributions  of  Y  without  relying  on  any 


1 28  /  Aptness  of  model  and  remedial  measures 


assumptions  about  the  nature  of  the  regression  relation;  hence,  it  is  a  “pure” 
measure  of  the  error  variance. 

Lack  of  fit  component.  The  second  component  of  SSE  is: 

(4.12)  SSLF  =  SSE  -  SSPE 

where  SSLF  denotes  lack  of  fit  sum  of  squares.  It  can  be  shown  that: 

(4. 12a)  SSLF  =  n/Yj  -  Yf)2 

7=1 

where  Yj  denotes  the  fitted  value  when  X  =  Xj.  Thus,  SSLF  is  a  weighted  sum  of 
squares  (the  weights  are  the  sample  sizes  nj)  of  the  deviations: 

(4.13)  Yj  -  Yj 

Note  that  these  deviations  represent  the  difference  between  the  mean  Yj  and  the 
fitted  value  Yj  based  on  the  regression  model.  The  closer  the  Yj  are  to  the  Yj,  the 
greater  is  the  evidence  that  the  fitted  regression  function  is  a  good  fit  and  there¬ 
fore  appropriate.  The  further  the  Yj  deviate  from  the  Yj,  the  more  the  indication 
that  the  fitted  regression  function  is  inappropriate. 

Figure  4.11  illustrates  for  the  observation^  =  125,  T13  =  160,  the  partition¬ 
ing  of  the  error  deviation  T13  ~  Y3  =  48  into  the  pure  error  deviation  T13  —  T3 
=  5  and  the  lack  of  fit  deviation  Y3  —  f3  =  43  for  testing  whether  or  not  a  linear 
regression  function  is  a  good  fit. 


FIGURE  4.11  Illustration  of  decomposition  of  Yt!  -  Yj 


Number  of 
New  Accounts 


Size  of  Minimum  Deposit  (dollars) 


4.4  F  test  for  lack  of  fit  /  1 29 


There  are  c  —  2  degrees  of  freedom  associated  with  SSLF  when  testing  for 
lack  of  fit  of  a  linear  regression  function.  The  reason  is  that  there  are  c  levels  of 
X  and  two  degrees  of  freedom  are  lost  because  two  parameters  (J30  and  flf)  are 
estimated  in  obtaining  the  fitted  values  Yj.  Thus,  the  lack  of  fit  mean  square 
MSLF  is: 

(4.14)  MSLF  =  SSLF 

c  —  2 

For  our  example,  using  (4.12)  and  the  earlier  results  ( SSE  =  14,741.6  from 
Table  4.4,  SSPE  =  1,148.0  from  p.  126),  we  obtain: 

SSLF  =  14,741.6  -  1,148.0  =  13,593.6 

and: 

13,593.6 

MSLF  =  — ! - =  3,398.4 

6  —  2 

Table  4.6a  contains  the  general  ANOVA  table  including  the  decomposition  of 
SSE  just  explained  and  the  mean  squares  of  interest,  and  Table  4.6b  contains  the 
ANOVA  decomposition  for  our  example. 

TABLE  4.6  ANOVA  table  for  testing  lack  of  fit  of  simple  linear  regression  function 


(a)  General 

Source  of 


Variation 

SS 

df 

MS 

Regression 

SSR 

1 

MSR 

Error 

SSE 

n  —  2 

MSE 

Lack  of  fit 

SSLF 

c  -  2 

MSLF 

Pure  error 

SSPE 

n  —  c 

MSPE 

Total 

SSTO 

n  —  1 

(b)  Bank  Example 

Source  of 
Variation 

SS 

df 

MS 

Regression 

SSR  =  5,141.3 

1 

MSR  =  5,141.3 

Error 

SSE  =  14,741.6 

9 

MSE=  1,638.0 

Lack  of  fit 

SSLF  =  13,593.6 

4 

MSLF  =  3,398.4 

Pure  error 

SSPE=  1,148.0 

5 

MSPE  =  229.6 

Total 

SSTO  =  19,882.9 

10 

F  test 

Test  statistic.  For  testing  lack  of  fit  of  the  regression  function,  the  appropri¬ 
ate  test  statistic  is: 


130  /  Aptness  of  model  and  remedial  measures 


(4.15) 


f*  = 


MSLF 

MSPE 


We  noted  that  MSPE  has  expectation  a2  no  matter  what  is  the  nature  of  the 
regression  function.  It  can  be  shown  that  for  testing  lack  of  fit  of  a  simple  linear 
regression  function: 


(4.16) 


E(MSLF)  =  a2  + 


Znj[E{Yj)  -  QBq  +  PiXf)]2 
c-2 


where  E(YJ)  denotes  the  true  mean  of  the  distribution  of  Y  when  X  =  Xj  and 
(B0  +  (BiXj  is  the  mean  response  indicated  by  the  linear  regression  model.  If  the 
regression  function  is  linear,  the  second  term  in  (4.16)  is  0  so  that  E{MSLF )  = 
a2  then.  On  the  other  hand,  if  the  regression  function  is  not  linear,  E(Yj) 
fio  +  fi\Xj  so  that  E(MSLF)  will  be  greater  than  a2.  Hence,  a  value  of  F*  near  1 
accords  with  a  linear  regression  function;  large  values  of  F*  indicate  that  the 
regression  function  is  not  linear. 


Decision  rule.  Since  SSLF  and  SSPE  are  additive,  as  are  the  degrees  of 
freedom,  we  know  from  Cochran’s  theorem  thatF*  follows  the  F(c  —  2;  n  —  c) 
distribution  if  the  regression  function  is  linear  and  all  other  conditions  of  model 
(3.1)  hold.  To  decide  between: 


(4  l?)  H0:  E(Y )  =  j8b  +  PxX 

Ha :  E(Y)  ^(30  +  frX 

we  use  the  test  statistic  (4.15).  The  decision  rule  to  control  the  risk  of  a  Type  I 
error  at  a  is: 


If  F*  <  F(1  ~  a;  c  —  2,  n  —  c ),  conclude  H0 
If  F *  >  F(1  —  a\  c  —  2,  n  —  c),  conclude  Ha 


Example.  For  our  example,  the  test  statistic  can  be  constructed  easily  from 
the  results  in  Table  4.6b: 


f* 


3,398.4 

229.6 


14.80 


If  the  level  of  significance  is  to  be  a  —  .01,  we  require  F(.99;  4,  5)  =  11.4. 
Since  F*  =  14.80  >  11.4,  we  conclude  Ha,  that  the  regression  function  is  not 
linear.  This,  of  course,  accords  with  our  visual  impression  from  Figure  4.10.  To 
report  the  P- value  for  the  test  statistic,  we  note  that  F*  =  14.80  lies  between 
F(.99;  4,  5)  =  11.4  and  F(.995;  4,  5)  =  15.6  and  thus  the  P-value  must  be  be¬ 
tween  .005  and  .01.  The  exact  P-value  can  be  shown  to  be  .006. 

Comments 


1 .  As  was  shown  by  our  example,  not  all  levels  of  X  need  have  repeat  observations  for 
the  F  test  for  lack  of  fit  to  be  applicable.  Repeat  observations  at  only  one  or  some  levels  of 
X  are  adequate. 


4.4  Ftest  for  lack  of  fit  /  131 


2.  The  F  test  for  lack  of  fit  falls  into  the  framework  of  a  general  linear  test  discussed  in 
Section  3.8.  The  full  model  is: 

(4.19)  For  each  Xj,  Y  is  normal  with  mean  E(YJ)  and  variance  cr2. 

The  least  squares  estimator  of  E{Yj)  for  the  full  model  is  Yj  so  that  the  error  sum  of  squares 
is: 


(4.20)  SSE(F)  =  S  S  (yij  ~  YJ)2  =  SSPE 

1  J 

which  has  associated  with  it  n  —  c  degrees  of  freedom. 

Since  H0  states: 

(4.21)  H0:E(Y)  =  +  faX 

the  error  sum  of  squares  for  the  reduced  model  is: 

(4.22)  SSE(R)  =  E  S  (yU  ~  yf  =  SSE 

1  j 

which  has  associated  with  it  n  -  2  degrees  of  freedom.  Substituting  into  (3.68)  and 
utilizing  (4.12),  we  obtain: 


(4.23) 


SSE  -  SSPE  ^  SSPE  =  SSLF  ^  SSPE  =  MSLF 
(n  —  2)  —  (n  —  c)  n  —  c  c  —  2  n  —  c  MSPE 


the  same  test  statistic  as  in  (4.15). 

3.  Suppose  that  prior  to  any  analysis  of  the  aptness  of  the  model,  we  had  wished  to 
test  whether  or  not  /3X  —  0  for  the  data  underlying  Table  4.4.  The  test  statistic  (3.59) 
would  be: 

^  MSR  5,141.3 
MSE  1,638.0 

For  a  —  .10,  F(.90;  1,9)  =  3.36,  and  we  would  conclude  H0,  that  —  0  or  that  there  is 
no  linear  association  between  minimum  deposit  size  (and  value  of  gift)  and  number  of 
new  accounts.  A  conclusion  that  there  is  no  relation  between  these  variables  would  be 
improper,  however.  Such  an  inference  requires  that  model  (3.1)  is  appropriate.  Here  it  is 
not,  as  we  have  seen,  because  the  regression  function  is  not  linear.  There  exists  indeed  a 
(curvilinear)  relation  between  minimum  deposit  size  and  number  of  new  accounts,  and 
testing  whether  or  not  (3X  =  0  under  these  circumstances  has  entirely  different  implica¬ 
tions.  This  illustrates  the  importance  of  always  examining  the  aptness  of  a  model  before 
further  inferences  are  drawn. 

4.  The  F  test  approach  just  explained  can  be  used  to  test  the  aptness  of  other  regres¬ 
sion  functions,  not  just  the  simple  linear  one  in  (4.17).  Only  the  degrees  of  freedom  for 
SSLF  will  need  be  modified.  In  general,  c  —  p  degrees  of  freedom  are  associated  with 
SSLF,  where  p  is  the  number  of  parameters  in  the  regression  function.  For  the  test  of  a 
simple  linear  regression  function,  p  —  2  because  there  are  two  parameters,  /30  and  /3| ,  in 
the  regression  function. 

5.  The  alternative  Ha  in  (4.17)  includes  all  regression  functions  other  than  a  linear 
one.  For  instance,  it  includes  a  quadratic  regression  function  or  a  logarithmic  one.  If  Ha  is 
concluded,  a  study  of  residuals  can  be  helpful  in  identifying  an  appropriate  function. 

6.  Clearly,  repeat  observations  are  most  valuable  whenever  we  are  not  certain  of  the 
nature  of  the  regression  function.  If  at  all  possible,  provision  should  be  made  for  some 
replications. 


132  /  Aptness  of  model  and  remedial  measures 


7.  If  we  conclude  that  the  employed  model  in  Hq  is  appropriate,  the  usual  practice  is 
to  use  the  error  mean  square  MSE  as  an  estimator  of  cr2  in  preference  to  the  pure  error 
mean  square  MSPE,  since  the  former  contains  more  degrees  of  freedom. 

8.  Observations  at  the  same  level  of  X  are  genuine  repeats  only  if  they  involve  inde¬ 
pendent  trials  with  respect  to  the  error  term.  Suppose  that  in  a  regression  analysis  of  the 
relation  between  hardness  (7)  and  amount  of  carbon  (X)  in  specimens  of  an  alloy,  the 
error  term  in  the  model  covers,  among  other  things,  random  errors  in  the  measurement  of 
hardness  by  the  analyst  and  effects  of  uncontrolled  production  factors  which  vary  at 
random  from  specimen  to  specimen  and  affect  hardness.  If  the  analyst  takes  two  readings 
on  the  hardness  of  a  specimen,  this  will  not  provide  genuine  replication  because  the 
effects  of  random  variation  in  the  production  factors  are  fixed  in  any  given  specimen.  For 
genuine  replication,  different  specimens  with  the  same  carbon  content  ( X )  would  have  to 
be  measured  by  the  analyst  so  that  all  the  effects  covered  in  the  error  term  could  vary  at 
random  from  one  repeated  observation  to  the  next. 


4.5  REMEDIAL  MEASURES 

If  the  simple  linear  regression  model  (3.1)  is  not  appropriate  for  the  data  at 
hand,  there  are  two  basic  choices: 

1.  Abandon  model  (3.1)  and  search  for  a  more  appropriate  model. 

2.  Use  some  transformation  on  the  data  so  that  model  (3.1)  is  appropriate  for 
the  transformed  data. 

Each  approach  has  advantages  and  disadvantages.  The  first  approach  may 
entail  a  more  complex  model  which  may  yield  better  insights,  but  may  also  lead 
into  serious  difficulties  in  estimating  the  parameters.  Successful  use  of  transfor¬ 
mations,  on  the  other  hand,  leads  to  relatively  simple  methods  of  estimation  and 
may  involve  fewer  parameters  than  a  complex  model,  an  advantage  when  the 
sample  size  is  small.  Yet  transformations  may  obscure  the  fundamental  intercon¬ 
nections  between  the  variables,  though  at  other  times  may  illuminate  them. 

We  shall  consider  the  use  of  transformations  in  this  chapter  and  the  use  of 
more  complex  models  in  later  chapters.  First,  we  provide  a  brief  overview  of 
remedial  measures. 


Nonlinearity  of  regression  function 

If  the  regression  function  is  not  linear,  a  direct  approach  is  to  modify  model 
(3.1)  with  respect  to  the  nature  of  the  regression  function.  For  instance,  a  quad¬ 
ratic  regression  function  might  be  used: 

(4.24)  E(Y)  =  fa  +  faX  +  faX2 
or  an  exponential  regression  function: 

(4.25)  E(Y)  =  faff 

In  Chapter  9,  we  discuss  models  where  the  regression  function  is  a  polynomial; 
and  in  Chapter  14,  we  discuss  exponential  regression  functions. 


4.5  Remedial  measures  /  133 


The  transformation  approach  uses  a  transformation  to  linearize,  at  least  ap¬ 
proximately,  a  nonlinear  regression  function.  For  instance,  the  transformation: 

(4.26)  r  =  log  Y 

where  Y'  is  the  transformed  variable,  is  often  useful.  We  discuss  the  use  of 
transformations  to  linearize  regression  functions  in  Section  4.6. 


Nonconstancy  of  error  variance 

If  the  error  variance  is  not  constant  but  varies  in  a  systematic  fashion,  a  direct 
approach  is  to  modify  the  model  to  allow  for  this  and  use  the  method  of  weighted 
least  squares  to  obtain  the  estimators  of  the  parameters.  We  discuss  the  use  of 
weighted  least  squares  for  this  purpose  in  Section  5.7. 

Transformations  can  also  be  effective  in  stabilizing  the  variance.  Some  of 
these  are  discussed  in  Section  4.6.  For  instance,  the  transformation: 

(4.27)  r  =  VT 

is  useful  in  a  number  of  applications  for  stabilizing  the  variance. 


Nonindependence  of  error  terms 

If  the  error  terms  are  correlated,  a  direct  remedial  measure  is  to  work  with  a 
model  which  calls  for  correlated  error  terms.  We  discuss  such  a  model  in  Chap¬ 
ter  13.  A  simple  remedial  transformation  which  is  often  helpful  is  to  work  with 
first  differences,  a  topic  also  discussed  in  Chapter  13. 


Nonnormality  of  error  terms 

Lack  of  normality  and  nonconstant  error  variances  frequently  go  hand  in 
hand.  Fortunately,  it  is  often  the  case  that  the  same  transformation  which  helps 
stabilize  the  variance,  such  as  a  logarithmic  or  a  square  root  transformation,  is 
also  helpful  in  normalizing  the  error  terms.  It  is  therefore  desirable  that  the 
transformation  for  stabilizing  the  error  variance  be  utilized  first,  and  then  the 
residuals  studied  to  see  if  serious  departures  from  normality  are  still  present.  We 
discuss  transformations  to  achieve  normality  in  Section  4.6. 


Omission  of  important  independent  variables 

When  residual  analysis  indicates  that  an  important  independent  variable  has 
been  omitted  from  the  model,  the  solution  is  to  modify  the  model.  In  Chapter  7 
and  following  chapters  of  Part  II,  we  discuss  multiple  regression  analysis  in 
which  two  or  more  independent  variables  are  utilized. 


134  /  Aptness  of  model  and  remedial  measures 


4.6  TRANSFORMATIONS 

We  now  consider  in  more  detail  the  use  of  transformations  of  one  or  both  of 
the  original  variables  before  carrying  out  the  regression  analysis.  Simple  trans¬ 
formations  of  either  the  dependent  variable  Y  or  the  independent  variable  X,  or  of 
both,  are  often  sufficient  to  make  the  simple  linear  regression  model  appropriate 
for  the  transformed  data.  We  shall  illustrate  the  use  of  simple  transformations  by 
three  examples. 


Example  1 

In  columns  1  and  2  of  Table  4.7  are  presented  data  on  number  of  days  of 
training  (X)  and  performance  score  (T)  for  10  sales  trainees  in  a  battery  of 
simulated  sales  situations  in  an  experiment.  These  observations  are  shown  as  a 
scatter  plot  in  Figure  4.12a.  Clearly  the  regression  relation  appears  to  be  curvilin¬ 
ear  so  that  the  simple  linear  regression  model  (3.1)  does  not  seem  to  be  appropri¬ 
ate. 


TABLE  4.7  Regression  calculations  with  square  root  transformation — sales  training 
example 


(1) 

Days  of 
Training 

X 

(2) 

Performance 

Score 

Y 

^  |  'I 

w  Lb 

(4) 

XY' 

(5) 

X2 

.5 

43 

6.5574 

3.2787 

.25 

.5 

40 

6.3246 

3.1623 

.25 

1.0 

71 

8.4261 

8.4261 

1.00 

1.0 

74 

8.6023 

8.6023 

1.00 

1.5 

107 

10.3441 

15.5162 

2.25 

1.5 

109 

10.4403 

15.6605 

2.25 

2.0 

158 

12.5698 

25.1396 

4.00 

2.5 

209 

14.4568 

36.1420 

6.25 

3.0 

270 

16.4317 

49.2951 

9.00 

3.5 

341 

18.4662 

64.6317 

12.25 

Total  17.0 

1,422 

112.6193 

229.8545 

38.50 

In  Figure  4. 12b,  the  same  data  are  plotted  but  the  dependent  variable  has  been 
transformed  as  follows: 


Y'  =  V¥~ 

where  Y'  denotes  the  transformed  variable.  Note  that  the  scatter  plot  now  shows 
a  reasonably  linear  relation  and  that  the  variability  of  the  scatter  is  reasonably 
constant  at  the  different  X  levels.  Hence,  the  simple  linear  regression  model  (3.1) 
now  appears  to  be  appropriate. 


4.6  Transformations  /  135 


FIGURE  4.12  Scatter  plots  of  original  and  transformed  observations — sales  training 
example 

(a)  Original  Observations  (b)  Transformed  Observations  (Y'  =  VT) 


£ 

o 

o 

00 

<d 

o 

c 

o 

E 

L_ 

o 

4— 

CD 

CL 


Days  of  Training  Days  of  Training 


Example  2 

At  times,  a  curvilinear  regression  relationship  is  accompanied  by  systematic 
changes  in  the  variability  of  the  error  terms  and/or  by  error  terms  which  follow  a 
highly  skewed  distribution.  In  columns  1  and  2  of  Table  4.8  are  presented  data  on 
age  (X)  and  plasma  level  of  a  poly  amine  (7)  for  14  healthy  children.  These  data 
are  plotted  in  Figure  4. 13a  as  a  scatter  plot.  Note  the  distinct  curvilinear  regres¬ 
sion  relationship,  as  well  as  the  greater  extent  of  scatter  for  younger  children  than 
for  older  ones. 


TABLE  4.8  Regression  calculations  with  logarithmic  transformation — plasma  levels 
example 


(1) 

Age 

X 

(2) 

Plasma  Level 

Y 

(3) 

iog10  y  =  t 

(4) 

XY' 

(5) 

X2 

0  (newborn) 

17.0 

1.23045 

0 

0 

0  (newborn) 

11.2 

1.04922 

0 

0 

1 

9.2 

.96379 

.96379 

1 

1 

12.6 

1.10037 

1.10037 

1 

2 

7.4 

.86923 

1.73846 

4 

2 

10.5 

1.02119 

2.04238 

4 

3 

8.3 

.91908 

2.75724 

9 

3 

5.8 

.76343 

2.29029 

9 

4 

4.6 

.66276 

2.65104 

16 

4 

6.5 

.81291 

3.25164 

16 

5 

5.3 

.72428 

3.62140 

25 

5 

3.8 

.57978 

2.89890 

25 

6 

3.2 

.50515 

3.03090 

36 

6 

4.5 

.65321 

3.91926 

36 

Total 

42 

109.9 

11.85485 

30.26567 

182 

136  /  Aptness  of  model  and  remedial  measures 


FIGURE  4.13  Scatter  plots  of  original  and  transformed  observations — plasma  levels 
example 

(a)  Original  Observations  (b)  Transformed  Observations  (Y'  =  log1Q  Y) 


18  n 


CD 

> 

CD 


14H 


o 

£ 

03 

_o 

CL 


ioH 


6H 


2  J - ( - 1 - , - 1 - 1 - 1 - 1 

0  1  2  3  4  5  6 

Age 


In  Figure  4.13b,  the  same  data  are  plotted  but  with  the  dependent  variable 
transformed  as  follows: 


r  =  log10  Y 

Note  that  the  transformation  not  only  has  led  to  a  reasonably  linear  regression 
relation  but  also  that  the  extent  of  scatter  at  the  different  levels  of  X  has  become 
reasonably  constant.  Hence,  use  of  the  simple  linear  regression  model  (3.1)  with 
the  transformed  data  now  appears  to  be  appropriate. 


Example  3 

In  Figure  4.14a,  we  present  a  scatter  plot  of  data  on  number  of  years  experi¬ 
ence  (. X )  and  current  hourly  earnings  (Y)  of  five  employees  in  a  shop  that  makes 
hairpieces  to  order.  The  data  suggest  strongly  that  the  regression  function  is 
curvilinear.  Since  the  Y  values  fall  within  a  relatively  small  range,  a  transforma¬ 
tion  on  Y  is  not  likely  to  be  effective  and  we  consider  a  transformation  on  the 
independent  variable: 


X'  = 


X 


Figure  4. 14b  contains  a  scatter  plot  with  the  transformed  variable  X' .  Note  that 
this  transformation  has  been  successful  since  the  points  tend  to  fall  in  a  linear 
pattern. 


Hourly  Earnings 


4.6  Transformations  /  137 


FIGURE  4.14  Scatter  plots  of  original  and  transformed  observations — hairpiece 
earnings  example 


(a)  Original  Observations 


10.  00  — | 


9.  75  H 


9.  50  H 


9.  25  H 


9.  00 


1  i  i  i 

2  4  6  8 

Years  of  Experience 


CO 

cn 

c 

‘c 

a 

Ld 

u 

o 

X 


(b)  Transformed  Observations  (X'  =  \(X) 


10.  00- 

• 

to 

-0 

cn 

_  i  _ 

• 

• 

• 

9.  50- 

9.  25- 

• 

CO 

o 

o 

1 _ 

1 

1 - . — 

- 1 - 1 

1 - 1 

- , - , - , - ! 

0.25  0.50  0.  75  1.00 


1 /(Years  of  Experience) 


Useful  transformations 

For  many  situations,  the  few  simple  transformation  types  just  illustrated  suf¬ 
fice  to  remedy  the  departures  from  the  simple  linear  regression  model  (3.1). 
These  transformations  can  be  applied  either  to  the  dependent  variable  Y,  the 
independent  variable  X,  or  occasionally  to  both  variables,  as  follows: 


(4.28a) 

r  = 

=  Vy~ 

X' 

(4.28b) 

r  = 

=  log10  Y 

X' 

=  logio  x 

(4.28c) 

r  -- 

1 

Y 

X ' 

1 

~  X 

Figure  4. 15  is  a  guide  for  the  selection  of  the  transformation  type.  Note  that  in 
one  case  the  transformation  can  be  applied  either  to  the  Y  variable  or  to  the  X 
variable,  or  to  both.  Use  of  an  interactive  computer  package  for  preparing  scatter 
plots  based  on  the  different  transformations  can  be  most  helpful  in  deciding  on  an 
appropriate  transformation. 

Comments 

1 .  At  times,  theoretical  or  a  priori  considerations  can  be  utilized  to  help  in  choosing  an 
appropriate  transformation.  For  example,  when  the  shape  of  the  scatter  in  a  study  of  the 
relation  between  price  of  a  commodity  (X)  and  quantity  demanded  (Y)  is  that  in 
Figure  4.15c,  economists  may  prefer  a  logarithmic  transformation  on  both  Y  and  X  to 
linearize  the  relation  because  the  slope  of  the  regression  line  for  the  transformed  variables 
then  measures  the  price  elasticity  of  demand.  The  slope  is  then  commonly  interpreted  as 
showing  the  percent  change  in  quantity  demanded  per  1  percent  change  in  price,  where  it 
is  understood  that  the  changes  are  in  opposite  directions. 


138  /  Aptness  of  model  and  remedial  measures 


FIGURE  4.15  Potential  transformations  for  different  curvilinear  patterns 

(a)  Y'  =  y/Y  or  log  Y  or  l/Y  (b)  X'  =  \/X  or  log  X  or  l/X  (c)  Y'  =  VT  or  log  Y  or  1/F 

and/or 

X'  =  \fx  or  log  X  or  l/X 


XXX 


Similarly,  scientists  may  prefer  logarithmic  transformations  of  both  7  and  X  when 
studying  the  relation  between  radioactive  decay  (7)  of  a  substance  and  time  (X)  to 
linearize  a  curvilinear  relation  of  the  type  illustrated  in  Figure  4.15c  because  the  slope  of 
the  regression  line  for  the  transformed  variables  then  measures  the  decay  rate. 

2.  Transformations  of  X  do  not  affect  the  variability  or  shape  of  the  error  distribution, 
while  transformations  of  Y  do.  Hence,  when  curvilinearity  is  accompanied  by  unequal 
variability  or  skewness  of  the  error  distributions,  the  dependent  variable  needs  to  be 
transformed  to  remedy  these  additional  problems.  On  the  other  hand,  when  curvilinearity 
is  accompanied  by  constant  variability,  a  transformation  of  Y  may  lead  to  significant 
unequal  variability  of  the  error  terms  and  it  is  important  to  check  for  this  possibility  by 
examining  residual  plots  after  the  transformation. 

3 .  When  either  variable  can  be  transformed  (the  case  of  Figure  4. 15c),  the  variable  for 
which  the  observations  have  a  wider  range  should  be  considered  first  since  a  transforma¬ 
tion  is  not  likely  to  be  effective  when  the  range  of  the  observations  is  narrow. 

4.  After  a  transformation  has  been  tentatively  selected,  residual  plots  and  other  analy¬ 
ses  described  earlier  need  to  be  employed  to  ascertain  that  the  simple  linear  regression 
model  (3.1)  is  appropriate  for  the  transformed  data. 

5.  When  the  variance  of  the  error  terms  is  not  constant  but  has  a  particular  relation  to 
the  level  of  the  mean  response  E(Y)  for  given  X,  statistical  theory  indicates  an  appropriate 
transformation  to  stabilize  the  variance.  Three  important  cases  are  [where  cr?  denotes  the 
error  term  variance  and  £(7,)  denotes  the  mean  response  when  X  —  X,]: 

(4.29a)  If  crj  is  proportional  to  £(7;),  use  7'  =  VF" 

(4.29b)  If  a*  is  proportional  to  £(7,:),  use  7'  =  log  7 


(4.29c) 


(Ti  is  proportional  to  £(7,j,  use  7 


7 


At  times,  the  observed  variable  7  is  a  proportion,  such  as  the  proportion  of  families 
with  income  X  who  are  planning  to  purchase  a  new  car  next  month.  An  appropriate 
transformation  for  this  case  is: 

(4.29d)  If  observation  is  a  proportion,  use  Y'  —  2  arcsin  VF" 


4.6  Transformations  /  139 


This  transformation  can  be  made  readily  on  many  calculator  models.  Also,  tables  to 
facilitate  this  transformation  have  been  prepared,  such  as  the  one  in  Reference  4.3  which 
incorporates  a  slight  refinement  over  transformation  (4.29d)  to  improve  the  variance 
stabilization. 


Regression  analysis  with  transformed  data 

Once  the  data  have  been  transformed  to  make  the  simple  linear  regression 
model  (3.1)  appropriate,  the  regression  calculations  are  carried  out  in  the  usual 
fashion  with  the  transformed  data.  We  illustrate  these  calculations  for  two  earlier 
examples. 


Example  1.  Table  4.7  (p.  134)  contains  for  the  sales  training  example  the 
basic  calculations  required  for  the  least  squares  estimators  b0  and  b\.  Since 
Y'  =  VFnow  plays  the  role  of  Y  in  all  earlier  formulas,  we  obtain: 


EX, ST]  (17X112.6193) 

EX,T] - 5 — L  229.8545  - 


h 


n 


10 


EX? 


(EX,)2 


38.5 


n 


(17)2 

10 


4.00017 


1  1 

b0  =  —  (ST;  -  bilXi)  = - [112.6193  -  4.00017(17)]  =  4.46164 

n  10 

The  fitted  regression  function  is: 

f '  =  4.46164  +  4.00017X 


where  Y'  is  the  point  estimator  of  E(Yr),  the  mean  of  the  probability  distribution 
of  Y’  for  given  X. 

If  we  wish  to  obtain  the  fitted  regression  equation  in  the  original  units,  we 
simply  take  squares: 

Y=  (4.46164  +  4.00017X)2 


For  example,  when  X  =  3,  we  have: 

Y  =  [4.46164  +  4.00017(3)]2  =  271.0 

Figure  4.16  presents  a  residual  plot  for  the  fitted  regression  model  based  on 
the  transformed  data.  This  plot  shows  no  evidence  of  lack  of  fit  and,  in  view  of 
the  few  observations  for  large  X  values,  no  strong  evidence  of  unequal  error 
variances.  Hence,  the  square  root  transformation  of  Y  appears  to  have  been  effec¬ 
tive  here. 


Example  2.  Table  4.8  (p.  135)  contains  the  necessary  least  squares  calcula¬ 
tions  for  our  plasma  levels  example  where  the  transformation  Yf  =  logi0  Y  was 
found  to  be  effective  in  linearizing  a  curvilinear  relationship  and  in  stabilizing  the 


140  /  Aptness  of  model  and  remedial  measures 


FIGURE  4.16  Residual  plot  for  transformed  observations — sales 
training  example 

0.  2-1 


0.  1- 

o 

D 

“O 

o.  o- 

*C0 

CD 

-0.  1- 

-0.  2 


- 1 - 1 - 1 - 

1  2  3 

Days  of  Training 


I 

4 


error  variance.  Since  Y'  now  plays  the  role  of  Y  in  all  earlier  formulas,  we  obtain: 


^  SXfSy;  (42X11.85485) 

iXiY'i - ; — L  30.26567 


bi 


n 


14 


2X? 


(sxy 


182 


n 


(42)- 

14 


=  -.094623 


— (Sr;  -  fciSXj)  =  ——[11.85485  -  (-.094623)(42)]  =  1.130644 
n  14 


The  fitted  regression  function  therefore  is: 

Y'  =  1.130644  -  . 09462 3X 

where  Y'  is  the  point  estimator  of  E(Y'),  the  mean  of  the  probability  distribution 
of  Y'  for  given  X.  This  fitted  regression  function  is  plotted  in  Figure  4.13b. 

If  we  wish  to  obtain  the  fitted  regression  equation  in  the  original  units,  we 
simply  take  antilogs: 

Y  =  antilog10  (1.130644  -  .094623X) 

To  find  the  fitted  value  Y in  the  original  units  whenX  =  3,  for  example,  we  have: 
Y  =  antilog10  [1.130644  -  .094623(3)]  -  7.03 


Comments 

1 .  We  reiterate  the  importance  of  checking  the  model  (3.1)  assumptions  if  a  transfor¬ 
mation  on  Y  is  employed.  For  instance,  when  the  transformation  Y'  —  log  Y  is  employed 
with  model  (3.1),  it  is  assumed  that  the  distribution  of  log  Y  for  given  X  is  normal  with 


Problems  /  141 


constant  variance.  This  needs  to  be  checked  after  the  transformation  has  been  made. 

2.  When  transformed  models  are  employed,  the  estimators  b0  and  b{  obtained  by  least 
squares  have  the  least  squares  properties  with  respect  to  the  transformed  observations,  not 
the  original  ones. 


problems 

4.1.  Distinguish  between:  (1)  residual  and  standardized  residual,  (2)  E(£,)  =  0  and 
7  =  0,  (3)  error  term  and  residual. 

4.2.  Prepare  a  prototype  residual  plot  for  each  of  the  following  cases:  (1)  error  variance 
decreases  with  X,  (2)  true  regression  function  is  U  shaped  but  a  linear  regression 
function  is  fitted. 

4.3.  Refer  to  Grade  point  average  Problem  2. 15.  The  fitted  values  and  residuals  are: 


i: 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

b 

2.92 

2.33 

2.25 

1.58 

2.08 

3.51 

3.34 

2.67 

2.25 

1.91 

e  i'- 

.18 

-.03 

.75 

.32 

.42 

.19 

.06 

-.07 

.55 

-.31 

ii 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

b 

2.42 

2.84 

2.50 

3.59 

2.16 

1.91 

2.50 

3.26 

1.74 

2.25 

er. 

-.42 

.06 

-.20 

-.39 

-.36 

-.51 

-.50 

.54 

.46 

-.75 

a.  Plot  the  residuals  et  against  the  fitted  values  Y, .  What  departures  from  regres¬ 
sion  model  (3.1)  can  be  studied  from  this  plot?  What  are  your  findings? 

b.  Prepare  a  normal  probability  plot  of  the  residuals.  Also  obtain  the  coefficient 
of  correlation  between  the  ordered  residuals  and  their  expected  values  under 
normality.  Does  the  normality  assumption  appear  to  be  reasonable  here? 

c.  Information  is  given  below  for  each  student  on  two  variables  not  included  in 
the  model,  namely,  intelligence  test  score  (X2)  and  high  school  average  (X3). 
Prepare  additional  residual  plots  to  ascertain  whether  the  model  can  be  im¬ 
proved  by  including  either  of  these  variables.  What  do  you  conclude? 


ii 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

W: 

105  113  118  107 

110 

125 

115 

121 

117 

111 

X?,'- 

2.9 

2.8 

3.1 

2.4 

3.0 

2.4 

3.5 

3.1 

3.1 

2.9 

i: 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

X2 : 

123  114  120  132 

122 

110 

119 

109 

116 

108 

X3I 

3.2 

3.3 

3.4 

2.6 

3.0 

2.8 

3.3 

3.4 

2.6 

2.7 

Refer  to  Calculator  maintenance  Problem  2 

.16.  The  fitted 

values 

and  residuals 

are: 

i 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Y 

:  100.8 

86.1 

71.4 

12.4 

71.4 

56.6 

100.8 

41.9 

56.6 

Si 

:  -3.8 

-.1 

6.6 

-2.4 

3.6 

5.4 

.2 

-2.9 

-3.6 

i 

:  10 

11 

12 

13 

14 

15 

16 

17 

18 

% 

:  27.2 

115.6 

71.4 

27.2 

71.4 

100.8 

12.4 

56.6 

71.4 

:  5.8 

2.4 

-6.4 

-2.2 

-.4 

4.2 

4.6 

-7.6 

-3.4 

142  /  Aptness  of  model  and  remedial  measures 


a.  Prepare  residual  plots  of  e,  versus  Yt  and  et  versus  X{.  What  departures  from 
regression  model  (3.1)  can  be  studied  from  this  plot?  State  your  findings, 

b.  Prepare  a  normal  probability  plot  of  the  residuals.  Also  obtain  the  coefficient 
of  correlation  between  the  ordered  residuals  and  their  expected  values  under 
normality.  Does  the  normality  assumption  appear  to  be  tenable  here? 

c.  The  observations  are  given  in  time  order.  Plot  the  residuals  against  time  to 
ascertain  whether  or  not  the  error  terms  are  correlated  over  time.  What  is  your 
conclusion? 

d.  Information  is  given  below  on  two  variables  not  included  in  the  model, 
namely,  mean  operational  age  of  machines  serviced  on  the  call  (X2,  in 
months)  and  years  of  experience  of  the  service  person  making  the  call  (X3), 
Make  additional  residual  plots  to  ascertain  whether  the  model  can  be  im¬ 
proved  by  including  either  or  both  of  these  variables.  What  do  you  conclude? 


i: 

1 

2 

3 

4 

5 

6 

7 

8 

9 

X2: 

12 

21 

38 

16 

25 

32 

18 

14 

12 

X3: 

3 

6 

2 

2 

3 

4 

5 

2 

3 

i : 

10 

11 

12 

13 

14 

15 

16 

17 

18 

A2: 

35 

20 

8 

15 

17 

28 

29 

9 

14 

*3: 

6 

5 

3 

5 

6 

3 

5 

3 

6 

4.5.  Refer  to  Airfreight  breakage  Problem  2.17. 

a.  Obtain  the  residuals  eL  and  plot  them  against  Xt  to  ascertain  whether  any 
departures  from  regression  model  (3. 1)  are  evident.  What  is  your  conclusion? 

b.  Prepare  a  normal  probability  plot  of  the  residuals.  Also  obtain  the  coefficient 
of  correlation  between  the  ordered  residuals  and  their  expected  values  under 
normality  to  ascertain  whether  the  normality  assumption  is  reasonable  here. 
What  do  you  conclude? 

4.6.  Refer  to  Plastic  hardness  Problem  2.18. 

a.  Obtain  the  residuals  et  and  plot  them  against  the  fitted  values  Yt  to  ascertain 
whether  any  departures  from  regression  model  (3.1)  are  evident.  State  your 
findings. 

b.  Prepare  a  normal  probability  plot  of  the  residuals.  Also  obtain  the  coefficient 
of  correlation  between  the  ordered  residuals  and  their  expected  values  under 
normality.  Does  the  normality  assumption  appear  to  be  reasonable  here? 

4.7.  Refer  to  Muscle  mass  Problem  2.23. 

a.  Obtain  the  residuals  e,  and  plot  them  against  Yt  and  also  against  Xt  to  ascertain 
whether  any  departures  from  regression  model  (3.1)  are  evident.  State  your 
conclusions. 

b.  Prepare  a  normal  probability  plot  of  the  residuals.  Also  obtain  the  coefficient 
of  correlation  between  the  ordered  residuals  and  their  expected  values  under 
normality  to  ascertain  whether  the  normality  assumption  is  tenable  here. 
What  do  you  conclude? 

c.  The  observations  are  given  in  time  order.  Plot  the  residuals  against  time  to  see 
whether  or  not  the  error  terms  are  uncorrelated  over  time.  What  is  your 
finding? 

4.8.  Refer  to  Robbery  rate  Problem  2.24. 

a.  Obtain  the  residuals  and  make  a  residual  plot  of  e,-  versus  Y;.  What  does  the 
plot  show? 


Problems  /  143 


b.  Prepare  a  normal  probability  plot  of  the  residuals.  Also  obtain  the  coefficient 
of  correlation  between  the  ordered  residuals  and  their  expected  values  under 
normality.  What  do  you  conclude? 

4.9.  Electricity  consumption.  An  economist  studying  the  relation  between  house¬ 
hold  electricity  consumption  (7)  and  number  of  rooms  in  the  home  (X)  employed 
linear  regression  model  (3.1)  and  obtained  the  following  residuals: 

t:  1  2  3  4  5  6  7  8  9  10 

X;:  2  3  4  5  6  7  8  9  10  11 

e;:  3.2  2.9  -1.7  -2.0  -2.3  -1.2  -.9  .8  .7  .5 

Plot  the  residuals  et  against  X;-.  What  problem  appears  to  be  present  here?  Might  a 
transformation  alleviate  this  problem? 

4.10.  Per  capita  earnings.  A  sociologist  employed  linear  regression  model  (3.1)  to 
relate  per  capita  earnings  (7)  to  average  number  of  years  of  schooling  (X)  for  12 
cities.  The  fitted  values  7,  and  the  standardized  residuals  e^MSE  follow. 

i:  1  2  3  4  5  6  7  8  9  10  11  12 

F,:  9.9  9.3  10.2  9.6  10.2  12.4  14.3  9.6  9.2  15.6  11.2  13.1 

ei/V'MSE:  -1.12  .81  -.76  .43  .65  -.17  1.62  1.79  -.53  -3.78  .74  .32 

Plot  the  standardized  residuals  against  the  fitted  values.  What  does  the  plot  sug¬ 
gest? 

4.11.  Drug  concentration.  A  pharmacologist  employed  linear  regression  model  (3.1) 
to  study  the  relation  between  the  concentration  of  a  drug  in  plasma  (7)  and  the 
log-dose  of  the  drug  (X).  The  residuals  and  log-dose  levels  follow. 

i:  1  2  3  4  56789 

X,:  -10  1-1  0  1-10  1 

e,-:  .5  2.1  -3.4  .3  -1.7  4.2  -.6  2.6  -4.0 

Plot  the  residuals  et  against  X;.  What  conclusions  do  you  draw  from  the  plot? 

4.12.  A  student  states  that  she  doesn’t  understand  why  the  sum  of  squares  defined  in 
(4.9)  is  called  a  pure  error  sum  of  squares  “since  the  formula  looks  like  one  for  an 
ordinary  sum  of  squares.”  Explain. 

4.13.  Refer  to  Calculator  maintenance  Problem  2.16.  Some  additional  calculational 
results  are:  SSR  =  16,182.6,  SSE  =  321.4. 

a.  In  an  F  test  for  lack  of  fit  of  a  linear  regression  function,  what  are  the  alterna¬ 
tive  conclusions? 

b.  Perform  the  test  indicated  in  part  (a).  Control  the  risk  of  Type  I  error  at  .05. 
State  the  decision  rule  and  conclusion. 

c.  Does  your  test  in  part  (b)  detect  other  departures  from  model  (3.1),  such  as 
lack  of  constant  variance  or  lack  of  normality  in  the  error  terms?  Could  the 
results  of  the  test  be  affected  by  such  departures?  Discuss. 

4.14.  Refer  to  Plastic  hardness  Problem  2.18. 

a.  Perform  an  F  test  to  determine  whether  or  not  there  is  lack  of  fit  of  a  linear 
regression  function.  Use  a  level  of  significance  of  .01.  State  the  alternatives, 
decision  rule,  and  conclusion. 

b.  Assuming  that  the  number  of  replications  here  was  limited  in  advance  to  four, 
is  there  any  advantage  in  conducting  these  all  at  the  same  level  of  X?  Is  there 
any  disadvantage? 


144  /  Aptness  of  model  and  remedial  measures 

c.  Does  the  test  in  part  (a)  indicate  what  regression  function  is  appropriate  when 
it  leads  to  the  conclusion  that  the  regression  function  is  not  linear?  How 
would  you  proceed? 

4.15.  Solution  concentration.  A  chemist  studied  the  concentration  of  a  solution  (T) 
over  time  (A).  Fifteen  identical  solutions  were  prepared.  The  15  solutions  were 
randomly  divided  into  five  sets  of  three,  and  the  five  sets  were  measured,  respec¬ 
tively,  after  1,  3,  5,  7,  and  9  hours.  The  results  follow. 

i:  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15 

Xt:  999777555333  1  1  1 

Y,:  .07  .09  .08  .16  .17  .21  .49  .58  .53  1.22  1.15  1.07  2.84  2.57  3.10 

a.  Fit  a  linear  regression  function. 

b.  Perform  an  F  test  to  determine  whether  or  not  there  is  lack  of  fit  of  a  linear 
regression  function;  use  a  —  .025.  State  the  alternatives,  decision  rule,  and 
conclusion. 

c.  Does  the  test  in  part  (a)  indicate  what  regression  function  is  appropriate  when 
it  leads  to  the  conclusion  that  lack  of  fit  of  a  linear  regression  function  exists? 
Explain. 

4.16.  Refer  to  Solution  concentration  Problem  4.15. 

a.  Prepare  a  scatter  plot  of  the  data.  What  transformations  might  you  try  to 
achieve  linearity? 

b.  Use  transformation  Y'  =  log10  Y  and  obtain  the  estimated  linear  regression 
function  for  the  transformed  data. 

c.  Plot  the  estimated  regression  line  and  the  transformed  data.  Does  the  regres¬ 
sion  line  appear  to  be  a  good  fit  to  the  transformed  data? 

d.  Obtain  the  residuals  and  plot  them  against  the  fitted  values.  Also  prepare  a 
normal  probability  plot.  What  do  your  plots  show? 

e.  Express  the  estimated  regression  equation  in  the  original  units. 

4.17.  Sales  growth.  A  marketing  researcher  studied  annual  sales  of  a  product  that  had 
been  introduced  10  years  ago.  The  data  were  as  follows,  where  X  is  the  year 
(coded)  and  Y  is  sales  in  thousands  of  units: 

i:  1  2  3  4  5  6  7  8  9  10 

X,:  0  1  2  3  4  5  6  7  8  9 

y,:  98  135  162  178  221  232  283  300  374  395 

a.  Prepare  a  scatter  plot  of  the  data.  Does  a  linear  relation  appear  adequate  here? 

b.  Use  transformation  Y'  —  VFand  obtain  the  estimated  linear  regression  func¬ 
tion  for  the  transformed  data. 

c.  Plot  the  estimated  regression  line  and  the  transformed  data.  Does  the  regres¬ 
sion  line  appear  to  be  a  good  fit  to  the  transformed  data? 

d.  Obtain  the  residuals  and  plot  them  against  the  fitted  values.  Also  prepare  a 
normal  probability  plot.  What  do  your  plots  show? 

e.  Express  the  estimated  regression  equation  in  the  original  units. 


EXERCISES 


4.18.  A  student  fitted  a  linear  regression  function  for  a  class  assignment.  Some  results 
follow. 


Projects  /  145 


i: 

1 

2 

3 

4 

5 

Yf. 

35 

17 

42 

28 

53 

7: 

42 

29 

32 

32 

40 

e,: 

-7 

-12 

10 

-4 

13 

The  student  plotted  the  residuals  et  against  Yt  and  found  a  positive  relation.  When 
he  plotted  the  residuals  against  the  fitted  values  Yh  he  found  no  relation.  Why  is 
there  this  difference,  and  which  is  the  more  meaningful  plot? 

4.19.  If  the  error  terms  in  a  regression  model  are  independent  N(0,  a2),  what  can  be 
said  about  the  error  terms  after  transformation  X '  =  1/X  is  used?  Is  the  situation 
the  same  after  transformation  Y'  —  1/7  is  used? 

4.20.  Using  theorems  (1.65),  (1.36),  and  (1.37),  show  that  E{MSPE )  =  <j2  for  the 
normal  error  regression  model  (3.1). 


PROJECTS 

4.21.  Machine  speed.  The  number  of  defective  items  produced  by  a  machine  (7)  is 
known  to  be  linearly  related  to  the  speed  setting  of  the  machine  (X).  The  data 
below  were  collected  from  recent  quality  control  records. 

hi  2  3  4  5  6  7  8  9  10  11  12 

Xt:  200  400  300  400  200  300  300  400  200  400  200  300 

7;:  28  75  37  53  22  58  40  96  46  52  30  69 

a.  Assuming  regression  model  (3.1)  is  appropriate,  obtain  the  estimated  regres¬ 
sion  function  and  plot  the  residuals  against  X.  What  does  the  residual  plot 
show? 

b.  Calculate  the  sample  variance  s2  of  the  7  observations  for  each  of  the  follow¬ 
ing  machine  speeds:  X  =  200,  400.  Then  use  the  F  test  in  Table  1 ,4a  to  test 
whether  or  not  the  variances  at  these  two  X  levels  are  equal.  Use  a  —  .05. 
State  the  alternatives,  decision  rule,  and  conclusion. 

c.  Compute  7/Vs,  Y/s ,  and  Y/s2  for  each  of  the  three  X  levels.  Suggest  an 
appropriate  transformation  from  (4.29)  to  stabilize  the  variance  based  on  your 
computed  ratios. 

d.  Make  the  transformation  suggested  in  part  (c)  and  obtain  the  estimated  re¬ 
gression  line  for  the  transformed  data.  Plot  the  residuals  against  X.  Does  it 
appear  from  your  plot  that  the  purpose  of  the  transformation  has  been  at¬ 
tained? 

4.22.  Blood  pressure.  The  following  data  were  obtained  in  a  study  of  the  relation 
between  diastolic  blood  pressure  (7)  and  age  (X)  for  boys  5  to  13  years  old. 

/:  1  2  3  4  5  6  7  8 

X,:  5  8  11  7  13  12  12  6 

7,:  63  67  74  64  75  69  90  60 

a.  Assuming  regression  model  (3. 1)  is  appropriate,  obtain  the  estimated  regres¬ 
sion  function  and  plot  the  residuals  et  against  X;- .  What  does  your  residual  plot 
show? 


146  /  Aptness  of  model  and  remedial  measures 


b.  Omit  observation  7  from  the  data  and  obtain  the  estimated  regression  line 
based  on  the  remaining  seven  observations.  Compare  this  estimated  regres¬ 
sion  function  to  that  obtained  in  part  (a).  What  can  you  conclude  about  the 
effect  of  observation  7? 

c.  Using  your  fitted  regression  function  in  part  (b),  obtain  a  99  percent  predic¬ 
tion  interval  for  a  new  Y  observation  at  X  =  12.  Does  observation  Y1  fall 
outside  this  prediction  interval?  What  is  the  significance  of  this? 

4.23.  Refer  to  the  SMSA  data  set  and  Project  2.38.  For  each  of  the  three  fitted  regres¬ 
sion  models,  obtain  the  residuals  and  prepare  a  residual  plot  against  X  and  a 
normal  probability  plot.  Summarize  your  conclusions.  Is  linear  regression  model 
(3.1)  more  apt  in  one  case  than  in  the  others? 

4.24.  Refer  to  the  SMSA  data  set  and  Project  2.39.  For  each  geographic  region,  obtain 
the  residuals  and  prepare  a  residual  plot  against  X  and  a  normal  probability  plot. 
Do  the  four  regions  appear  to  have  similar  error  variances?  What  other  conclu¬ 
sions  do  you  draw  from  your  plots? 

4.25.  Refer  to  the  SENIC  data  set  and  Project  2.40. 

a.  For  each  of  the  three  fitted  regression  models,  obtain  the  residuals  and  pre¬ 
pare  a  residual  plot  against  X  and  a  normal  probability  plot.  Summarize  your 
conclusions.  Is  linear  regression  model  (3.1)  more  apt  in  one  case  than  in  the 
others? 

b.  Obtain  the  fitted  regression  line  for  the  relation  between  length  of  stay  and 
infection  risk  after  deleting  observations  47  (Y47  =  6.5,  F47  =  19.56)  and 
112  (Xn2  —  5.9,  7, 12  =  17.94).  From  this  fitted  regression  line  obtain  sepa¬ 
rate  95  percent  prediction  intervals  for  new  Y  observations  at  X  =  6.5  and 
X  —  5.9,  respectively.  Do  observations  ^47  and  Y\i2  fall  outside  these  predic¬ 
tion  intervals?  Discuss  the  significance  of  this. 

4.26.  Refer  to  the  SENIC  data  set  and  Project  2.41 .  For  each  geographic  region,  obtain 
the  residuals  and  prepare  a  residual  plot  against  X  and  a  normal  probability  plot. 
Do  the  four  regions  appear  to  have  similar  error  variances?  What  other  conclu¬ 
sions  do  you  draw  from  your  plots? 


CITED  REFERENCES 

4.1  Dixon,  W.  J.,  and  M.  B.  Brown,  eds.  BMDP-81,  Biomedical  Computer  Programs, 
P-Series.  Berkeley,  Calif.:  University  of  California  Press,  1981. 

4.2  Barnett,  Vic,  and  Toby  Lewis.  Outliers  in  Statistical  Data.  New  York:  John  Wiley 
&  Sons,  1978. 

4.3  Owen,  Donald  B.  Handbook  of  Statistical  Tables.  Reading,  Mass.:  Addison-Wesley 
Publishing,  1962. 


Simultaneous  inferences  and  other 
topics  in  regression  analysis — I 


In  this  chapter,  we  take  up  a  variety  of  topics  in  simple  regression  analysis. 
Several  of  the  topics  pertain  to  the  problem  of  how  to  make  simultaneous  infer¬ 
ences  from  the  same  set  of  sample  observations. 


5.1  JOINT  ESTIMATION  OF  fa  AND  /I, 

Need  for  joint  estimation 

A  market  research  analyst  conducted  a  study  of  the  relation  between  level  of 
advertising  (A)  and  sales  (F),  in  which  there  was  no  advertising  (A  =  0)  for  some 
observations  while  for  other  observations  the  level  of  advertising  was  varied.  The 
scatter  plot  suggested  a  linear  regression  in  the  range  of  the  advertising  expendi¬ 
tures  levels  studied.  The  analyst  wished  to  draw  inferences  about  both  the  inter¬ 
cept  fa  and  the  slope  fa.  One  means  of  doing  this  is  by  constructing  a  joint 
confidence  region  for  fa  and  fa  so  that  with  confidence  level  1  —  a  both  fa  and 
(3 1  are  contained  in  this  region. 

Joint  confidence  region 

A  joint  1  —  a  confidence  region  for  fa  and  /3,  is  illustrated  in  Figure  5.1.  It 
can  be  shown  that  such  a  region  is  given  by: 


147 


148  /  Simultaneous  inferences  and  other  topics — I 


n(b0  -  ft)2  +  2(SX,-)(&q  -  foXfr,  -  fr)  +  (IXfXbt  -  ft)2 

2MSE 

<  F(1  —  a;  2,  n  —  2) 

The  confidence  coefficient  1  —  a  indicates  that  with  repeated  sampling,  the 
confidence  region  (5.1)  will  contain  both  /30  and  in  (1  —  a)  100  percent  of  the 
cases.  The  confidence  region  consists  of  all  points  (/30,  (3{)  which  satisfy  the 
inequality  (5.1).  The  boundary  of  the  confidence  region  is  obtained  from  the 
equality  in  (5.1): 

n(b0  -  fio)2  +  2(^Xd(b0  -  ffoXfr  ~  ft)  +  (ZXfXh  ~  fr)2 
‘  }  2 MSE 

=  F(1  —  a;  2,  n  —  2) 

The  boundary  is  an  ellipse  centered  around  the  point  (b0,  bx),  as  illustrated  in 
Figure  5.1.  We  explain  now  by  an  example  how  the  boundary  is  calculated. 

Example.  Let  us  return  to  the  Westwood  Company  lot  size  example  of  the 
previous  chapters.  Suppose  that  we  require  a  joint  confidence  region  for  /30  and 


FIGURE  5.1  Elliptical  joint  90  percent  confidence  region  for  y80  and  {}■, — Westwood 
Company  example 


Pi 

2.20 


2.10 


=  2.0  2.00 


1.90 


1.80 


t 


l  I _ L 

10 


b0  =  10.0 


_l - 1 - 

15  [30 


5 


5.1  Joint  estimation  of  fa  and  fa  I  149 


Pi  with  .90  confidence  coefficient.  Then  1  —  a  =  .90  andF(.90;  2,  8)  =  3.11. 
From  previous  work  (Table  3.1),  we  have:  b0  =  10.0,  b\  =  2.0,  MSE  =  7.5, 
IX i  =  nX  =  500,  IXf  =  28,400.  Substitution  into  (5.2)  gives: 

10(10.0  -  j30)2  +  2(500)(10.0  -  00X2.0  -  ft)  +  28,400(2.0  -  ft)2 

2(7.5) 


Boundary  points  are  calculated  by  assigning  a  value  to  either  ft,  or  ft  and  finding 
corresponding  values  of  the  other  unknown.  For  example,  let  ft  =  10.0.  The 
quantity  (10.0  —  ft)  then  equals  0,  and  the  above  expression  reduces  to: 


28,400(2.0  -  ft)2 
2(7.5) 


3.11 


Manipulation  gives: 

p\  -  4.0^  +  3.998357  =  0 

This  quadratic  equation  has  two  roots,  1.95947  and  2.04053  (see  the  following 
Comment  3  for  a  brief  review  of  the  solution  of  a  quadratic  equation).  Hence, 
two  boundary  points  are  (10.0,  1.96)  and  (10.0,  2.04). 

Additional  boundary  points  are  found  in  the  same  manner,  by  assigning  a 
value  to  either  ft  or  ft  and  solving  for  the  other.  The  points  can  be  plotted  on  a 
graph  and  connected  to  form  the  boundary  of  the  elliptical  confidence  region,  as 
shown  in  Figure  5.1.  The  region  is  always  centered  around  (b0,  bx).  In  our 
example,  it  is  centered  around  (b0,  b{)  =  (10.0,  2.0).  The  joint  confidence  re¬ 
gion  indicates  that  ft  is  somewhere  between  1.88  and  2. 12,  and  that  ft  would  be 
in  the  neighborhood  of  4.0  if  Pi  were  near  the  upper  limit  and  about  16.0  if  Pi 
were  near  its  lower  limit.  Note  carefully  the  interrelation  between  the  estimates 
for  p0  and  ft:  the  larger  is  Pi,  the  smaller  would  be  ft,  and  vice  versa.  This 
interrelation  is  the  result  of  the  tilted  position  of  the  ellipse,  with  the  major  axis 
being  sloped  negatively. 

Comments 


1.  The  joint  confidence  region  can  be  used  directly  for  testing.  To  illustrate  this  use, 
suppose  an  industrial  engineer  working  for  the  Westwood  Company  theorized  that  the 
regression  function  should  have  an  intercept  of  13.0  and  a  slope  of  2.10.  Since  the  point 
(13.0,  2.10)  does  not  fall  in  the  joint  confidence  region,  we  would  conclude  at  the  a  — 

.  10  level  of  significance  that  either  ft  A  13.0  or  ft  A  2. 10  or  both.  Note  from  Figure  5.1 
that  the  engineer  may  be  correct  with  respect  to  either  the  intercept  or  the  slope,  but  that 
this  particular  combination  is  not  supported  by  the  data. 

2.  The  tilt  of  the  ellipse  is  a  function  of  the  covariance  between  bo  and  b\ .  It  can  be 
shown  that: 

(5.3)  o{bo,bi)  =  - Xa2(bi ) 

Both  the  tilt  of  the  ellipse  and  the  covariance  indicate  the  degree  to  which  the  point 
estimates  of  p0  and  px  obtained  from  the  same  sample  are  likely  to  err  in  a  similar  or  an 
opposite  direction  because  of  sampling  error.  A  positive  covariance  indicates  a  tendency 
for  the  values  of  b0  and  b\  to  be  jointly  too  high  or  jointly  too  low,  while  a  negative 
covariance  means  that  the  joint  errors  tend  to  be  in  opposite  directions. 

In  our  example,  X  —  50;  hence,  the  covariance  is  negative  and  the  ellipse’s  major  axis 


150  /  Simultaneous  inferences  and  other  topics — i 


is  sloped  negatively.  This  implies  that  the  estimators  b0  and  b\  tend  to  err  in  opposite 
directions.  We  expect  this  intuitively.  Since  the  observed  points  (Xh  Yj)  fall  in  the  first 
quadrant,  we  anticipate  that  if  the  slope  of  the  fitted  regression  line  is  too  steep  (b} 
overestimates  /3  j) ,  the  intercept  is  most  likely  to  be  too  low  (b0  underestimates  /30) ,  and 
vice  versa. 

When  the  independent  variable  is  Xi  —  X,  we  know  that  b'o  and  b  \  are  independent  and 
hence  have  zero  covariance.  Thus,  the  ellipse  will  in  this  case  have  axes  parallel  to  the 
axes  of  the  graph  so  that  there  is  no  tilt  in  the  confidence  region. 

3.  The  roots  of  a  quadratic  equation  of  the  form  ax 2  +  bx  +  c  =  0  are  given  by: 

—  b  ±  V^2  —  4 ac 


In  our  earlier  example,  the  quadratic  equation  was  0\  —  4.0/ih  +  3.998357  =  0,  so  that 
a  —  1,  b  =  —4.0,  and  c  =  3.998357.  Hence: 

-(-4.0)  ±  V(— 4.0)2  -  4(1)(3. 998357) 

Pl  ”  2(1) 

=  1.95947  and  2.04053 


Bonferroni  joint  confidence  intervals 

The  procedure  for  developing  a  joint  confidence  region  for  /30  and  is 
somewhat  cumbersome.  Also,  in  multiple  regression,  where  additional  parame¬ 
ters  are  involved  in  the  model,  the  joint  confidence  region  involves  three  or  more 
dimensions,  which  is  difficult  to  visualize. 

Hence,  it  is  often  preferable  to  construct  separate  confidence  intervals  for 
each  parameter.  For  instance,  by  the  methods  of  Chapter  3,  the  market  research 
analyst  in  our  earlier  example  could  construct  separate  95  percent  confidence 
intervals  for  /30  and  /3j .  The  difficulty  is  that  these  would  not  provide  95  percent 
confidence  that  the  conclusions  for  both  fi0  and  (3i  are  correct.  If  the  inferences 
were  independent,  the  probability  of  both  being  correct  would  be  (.95)2,  or  only 
.9025.  The  inferences  are  not,  however,  independent,  coming  as  they  do  from 
the  same  set  of  sample  data,  which  makes  the  determination  of  the  probability  of 
both  inferences  being  correct  much  more  difficult. 

Analysis  of  data  frequently  requires  a  series  of  estimates  (or  tests)  where  the 
analyst  would  like  to  have  an  assurance  about  the  correctness  of  the  entire  set  of 
estimates  (or  tests) .  We  shall  call  the  set  of  estimates  of  interest  the  family  of 
estimates.  In  our  illustration,  th &  family  consists  of  the  estimates  of  (30  and  /3j. 
We  then  distinguish  between  a  statement  confidence  coefficient  and  a  family 
confidence  coefficient.  The  former  is  the  familiar  type  of  confidence  coefficient 
discussed  earlier,  which  indicates  the  proportion  of  correct  estimates  that  are 
obtained  when  repeated  samples  are  selected  and  the  specified  confidence  inter¬ 
val  is  calculated  for  each  sample.  A  family  confidence  coefficient,  on  the  other 
hand,  indicates  the  proportion  of  correct  families  of  estimates  when  repeated 
samples  are  selected  and  the  specified  confidence  intervals  for  the  entire  family 
are  calculated  for  each  sample. 


5.1  Joint  estimation  of  ft,  and  ft  /  151 


To  illustrate  the  meaning  of  a  family  confidence  coefficient  further,  let  us 
return  to  the  joint  estimation  of  /30  and  /3, .  A  family  confidence  coefficient  of, 
say,  .95  would  indicate  for  this  situation  that  if  repeated  samples  are  selected  and 
interval  estimates  for  both  /30  and  /31  are  calculated  for  each  sample  by  specified 
procedures,  95  percent  of  the  samples  would  lead  to  a  family  of  estimates  where 
both  confidence  intervals  are  correct.  For  5  percent  of  the  samples,  either  one  or 
both  of  the  interval  estimates  would  be  incorrect. 

Clearly,  a  procedure  which  provides  a  family  confidence  coefficient  is  often 
highly  desirable  since  it  permits  the  analyst  to  weave  the  separate  results  together 
into  an  integrated  set  of  conclusions,  with  an  assurance  that  the  entire  set  of 
estimates  is  correct.  The  Bonferroni  method  of  developing  joint  confidence  inter¬ 
vals  with  a  specified  family  confidence  coefficient  is  a  very  simple  one:  each 
statement  confidence  .coefficient  is  adjusted  to  be  higher  than  1  —  a  so  that  the 
family  confidence  coefficient  is  1  —  a.  The  method  is  a  general  one  which  can 
be  applied  in  many  cases,  as  we  shall  see,  not  just  for  the  joint  estimation  of  /30 
and  (3i.  Here,  we  explain  the  Bonferroni  method  as  it  applies  for  estimating  /30 
and  /3j  jointly. 

Development  of  joint  confidence  intervals.  We  start  with  ordinary  confi¬ 
dence  limits  for  (30  and  (3i  with  statement  confidence  coefficients  1  —  a  each. 
These  are: 


b0  ±  t(  1  —  aJ2\  n  —  2)s(bo) 
bi  ±  t(  1  —  a/2 ;  n  —  2)s(bi) 

We  then  ask  what  is  the  probability  that  both  sets  of  limits  are  correct.  Let  Ai 
denote  the  event  that  the  first  confidence  interval  does  not  cover  (30  and  A2  denote 
the  event  that  the  second  confidence  interval  does  not  cover  (31.  We  know: 

P(Aj)  =  a  P(A2)  =  a 

Probability  theorem  (1.6)  states: 

P(A,  U  A2)  =  P(Ai)  +  P(A2)  -  P(A j  n  A2) 

and  hence: 

(5.4)  1  -  P(Aj  U  A2)  =  1  -  P(A,)  -  P(A2)  +  P(Ai  fl  A2) 

Now  by  probability  theorems  (1.9)  and  (1.10),  we  have: 

1  -  P(A1  U  A2)  =  P(Al  UA2)  =  P(Aj  fl  A2) 

P(Ai  fl  A2)  is  the  probability  that  both  confidence  intervals  are  correct.  We  thus 
have  from  (5.4): 

(5.5)  P(Al  n  A2)  =  1  -  P(Ai)  -  P(A2)  +  P(Ai  n  A2) 

Since  P(AX  fl  A2)  >  0,  we  obtain  from  (5.5)  the  Bonferroni  inequality: 

(5.6)  P(A\  n  A2)  >  1  -  P(Aj)  -  P(A2) 


152  /  Simultaneous  inferences  and  other  topics — I 


which  for  our  situation  is: 

(5.6a)  P(AX  n  A2)  >  1  -  a  -  a  =  1  -  2a 

Thus,  if  /3o  and  /3i  are  separately  estimated  with,  say,  95  percent  confidence 
intervals,  the  Bonferroni  inequality  guarantees  us  a  family  confidence  coefficient 
of  at  least  90  percent  that  both  intervals  based  on  the  same  sample  are  correct. 

We  can  easily  use  the  Bonferroni  inequality  (5.6a)  to  obtain  a  family  confi¬ 
dence  coefficient  of  at  least  1  -  a  for  estimating  (3 0  and  (3X.  We  do  this  by 
estimating  j30  and  (3X  separately  with  statement  confidence  coefficients  of 
1  —  all  each.  Thus,  the  1  —  a  family  confidence  limits  for  /30  and  (3} ,  often 
called  a  confidence  set,  are  by  the  Bonferroni  procedure: 

b0  ±  Bs(b0) 

(5.7) 

b,±  Bs(b\) 

where: 

(5.7a)  B  —  t(  1  —  a/4;  n  —  2) 

Note  that  a  statement  confidence  coefficient  of  1  —  all  requires  the 
(1  —  a/4)100  percentile  of  the  t  distribution  for  a  two-sided  confidence  interval. 

Example.  For  the  Westwood  Company  lot  size  application,  90  percent  fam¬ 
ily  confidence  intervals  for  (30  and  (3\  require  B  =  t(  1  —  .10/4;  8)  =  f(.975;  8) 
=  2.306.  We  have  from  before: 

b0  =  10.0  s(b0)  =  2.50294 

bx  =  2.0  s(bx)  =  .04697 

Hence,  the  two  pairs  of  confidence  limits  are  10.0  ±  2.306(2.50294)  and 
2.0  ±  2.306(.04697),  and  the  joint  confidence  intervals  are: 

4.2282  <  j80<  15.7718 
1.8917  <jSi<  2.1083 

Thus,  we  conclude  that  f30  is  between  4.23  and  15.77  and  (3i  is  between  1.89  and 
2.11.  The  family  confidence  coefficient  is  at  least  .90  that  the  procedure  leads  to 
correct  pairs  of  interval  estimates. 

Comments 

1.  We  reiterate  that  the  Bonferroni  1  —  a  family  confidence  coefficient  is  actually  a 
lower  bound  on  the  true  (but  often  unknown)  family  confidence  coefficient.  To  the  extent 
that  incorrect  interval  estimates  of  [30  and  [3]  tend  to  pair  up  in  the  family  (particularly 
when  the  covariance  between  b0  and  b\  is  large),  the  families  of  statements  will  tend  to  be 
correct  more  than  (1  —  a)  100  percent  of  the  time. 

2.  The  Bonferroni  inequality  (5.6a)  can  easily  be  extended  to  g  simultaneous  confi¬ 
dence  intervals  with  family  confidence  coefficient  1  —  a: 


p  ru  >  i 


1=1 


ga 


(5.8) 


5.1  Joint  estimation  of  bo  and  /?-,  /  153 


Thus,  if  g  interval  estimates  are  desired  with  a  family  confidence  coefficient  1  —  a, 
constructing  each  interval  estimate  with  statement  confidence  coefficient  1  —  alg  will 
suffice. 

3.  For  a  given  family  confidence  coefficient,  the  larger  the  number  of  confidence 
intervals  in  the  family,  the  greater  becomes  the  multiple  B,  which  may  make  some  or  all 
of  the  confidence  intervals  too  wide  to  be  helpful.  The  Bonferroni  technique  is  ordinarily 
most  useful  when  the  number  of  simultaneous  estimates  is  not  too  large. 

4.  It  is  not  necessary  with  the  Bonferroni  procedure  that  the  confidence  intervals  have 
the  same  statement  confidence  coefficient.  Different  statement  confidence  coefficients 
can  be  used,  depending  on  the  importance  of  each  estimate.  For  instance,  in  our  earlier 
illustration  j30  might  be  estimated  with  a  92  percent  confidence  interval  and  j3,  with  a  98 
percent  confidence  interval.  The  family  confidence  coefficient  by  (5.6)  will  still  be  at  least 
90  percent. 

Comparison  of  two  approaches.  Figure  5.2  contains  the  joint  90  percent 
confidence  region  by  the  Bonferroni  approach  for  our  example.  Note  that  the 
region  is  a  rectangle  since  the  Bonferroni  approach  does  not  utilize  the  existing 
relationship  between  b0  and  b\.  The  rectangle  is  centered  at  (b0,  b\).  Also  shown 
in  Figure  5.2  is  the  elliptical  90  percent  joint  confidence  region  which  we  ob¬ 
tained  earlier,  which  is  also  centered  at  ( b0 ,  b\).  The  elliptical  region  is  more 
efficient  in  that  it  covers  fewer  (/30,  fi{)  points  for  the  same  confidence  coeffi- 

FIGURE  5.2  Bonferroni  and  elliptical  joint  90  percent  confidence  regions  for  (30  and 
>3-i — Westwood  Company  example 


154  /  Simultaneous  inferences  and  other  topics — I 


cient.  Nevertheless,  the  Bonferroni  approach  can  be  highly  useful  in  many  cases. 
In  multiple  regression  applications,  in  particular,  the  Bonferroni  method  comes 
into  its  own  because  of  the  ease  of  obtaining  and  interpreting  the  joint  estimates. 

5.2  CONFIDENCE  BAND  FOR  REGRESSION  LINE 

At  times  we  would  like  to  obtain  a  confidence  band  for  the  regression  line 
E(Y)  =  jS0  +  fi\X  so  that  we  can  see  in  what  region  the  entire  regression  line 
lies.  This  differs  from  estimating  E(Yh)  =  /30  +  fi\Xh  for  a  particular  value  of  Xh 
by  an  interval  estimate,  which  we  took  up  in  Section  3.4. 

To  obtain  a  confidence  band  for  the  entire  regression  line,  we  essentially  need 
to  consider  the  regression  lines  for  all  possible  (/30,  fi\)  combinations  in  the 
elliptical  joint  confidence  region  (5.1)  for  /30  and  /3, .  For  our  Westwood  Com¬ 
pany  example,  three  possible  (/So,  jSi)  combinations  (Figure  5.1)  and  their  corre¬ 
sponding  regression  lines  are: 


Po 

Pi 

E(Y ) 

=  Po  +  PiX 

4.00 

2.11 

E(Y)  = 

4.00  +  2.1  IX 

9.00 

2.00 

E(Y)  = 

9.00  +  2.00X 

16.00 

1.89 

E(Y)  = 

16.00  +  1.89X 

Figure  5.3  contains  a  plot  of  these  three  possible  regression  lines.  As  additional 
lines  for  other  possible  (/30,  /S^  combinations  in  the  confidence  region  in  Fig¬ 
ure  5. 1  are  plotted,  we  will  fill  out  a  confidence  band  for  the  regression  line.  The 
boundaries  of  the  confidence  band  are  sketched  in  Figure  5.3  by  the  broken  lines, 
which  are  hyperbolas. 

Working  and  Hotelling  derived  the  formula  for  the  1  —  a  hyperbolic  confi¬ 
dence  band  for  the  regression  line.  At  any  level  Xh,  the  two  boundary  values  of 
the  confidence  band  are: 


(5.9) 

Yh  ±  Ws(Yh) 

where: 

(5.9a) 

W2  =  2E(1  -  a;  2,  n  -  2) 

A  A 

and  Yh  and  s(Yh)  are  defined  in  (3.27)  and  (3.30),  respectively. 


5.2  Confidence  band  for  regression  line  /  155 


FIGURE  5.3  Plot  of  three  possible  regression  lines  for  joint  confidence  region  in 
Figure  5.1  (V"  values  not  plotted  to  scale) 


Man-Hours 


Let  us  find  the  boundary  points  of  the  confidence  band  at  Xh  =  55: 


■55 


10.0  +  2.0(55)  =  120.0 


s2(F55)  =  7.5 


1  (55  -  50): 


10 


+ 


3,400 


=  .80515 


or: 


s(f55)  =  .89  7  3  0 

Hence,  the  90  percent  boundary  points  of  the  confidence  band  for  the  regression 
line  at  Xh  =  55  are  120.0  ±  2.494(. 89730)  or: 


117.8  <j80  +  PvXh<  122.2 


In  similar  fashion,  the  boundary  points  at  a  number  of  other  values  of  Xh  can 
be  developed  and  the  boundary  curves  then  sketched  in.  This  has  been  done  in 
Figure  5.4,  which  contains  the  90  percent  confidence  band  for  the  regression  line 
for  the  Westwood  Company  example. 


156  /  Simultaneous  inferences  and  other  topics — I 


FIGURE  5.4  90  percent  confidence  band  for  regression  line — Westwood  Company 
example 


Man-Hours 

Y 

175  - 


150 


125 


100 


75 


50 


Y  + 

/N 

Y 

A 

Y- 


HM7) 


0 


20 


_i _ i _ i _ i _ u 

30  40  50  60  70 

Lot  Size 


80  X 


Comments 

1.  The  boundary  points  (5.9)  of  the  confidence  band  for  the  regression  line  are  of 
exactly  the  same  form  as  the  confidence  limits  for  a  single  mean  response  E(Yh)  in  (3.32) 
except  that  the  t  multiple  has  been  replaced  by  the  W  multiple  (named  after  Working). 

2.  If  we  had  wished  to  estimate  a  single  mean  response  with  a  90  percent  confidence 
coefficient,  the  required  t  value  would  have  been  ?(.95;  8)  =  1 .860.  Note  that  this  t  multi¬ 
ple,  though  smaller  than  the  multiple  W  —  2.494  for  the  entire  regression  line,  does  not 
differ  by  any  major  extent.  This  is  typically  the  case,  so  that  the  boundary  points  of  the 
regression  line  band  usually  are  not  very  much  further  apart  than  the  confidence  limits  for 
a  single  mean  response  E(Yh)  at  a  given  X/,  value.  With  the  somewhat  wider  limits  for  the 
entire  regression  line,  one  is  able  to  draw  conclusions  about  any  and  all  mean  responses 


5.3  Simultaneous  estimation  of  mean  responses  /  157 


for  the  entire  regression  line  and  not  just  about  the  mean  response  at  a  given  X  level.  One 
use  of  this  broader  base  for  inferences  will  be  explained  in  the  next  section. 

3.  The  confidence  band  (5.9)  applies  to  the  entire  regression  line  over  all  real -num¬ 
bered  values  of  X  from  —  °°  to  +°°.  The  confidence  coefficient  indicates  the  percent  of 
time  the  estimating  procedure  will  yield  a  band  which  covers  the  entire  line,  in  a  long 
series  of  samples  in  which  the  X  observations  are  kept  at  the  same  level  as  in  the  sample 
actually  taken. 

In  applications,  the  confidence  band  is  ignored  for  that  part  of  the  regression  line 
which  is  not  of  interest  in  the  problem  at  hand.  In  the  Westwood  Company  example,  for 
instance,  negative  lot  sizes  would  be  ignored.  The  confidence  coefficient  for  a  limited 
segment  of  the  band  of  interest  is  somewhat  higher  than  1  —  a,  so  that  1  —  a  serves  then 
as  a  lower  bound  to  the  confidence  coefficient. 

4.  Research  continues  on  confidence  banding  of  the  regression  line.  An  alternative 
procedure  to  the  Working-Hotelling  one,  for  instance,  gives  a  confidence  band  of  uniform 
width  over  a  finite  interval  on  the  X  axis  centered  around  X  (Ref.  5.1). 

5.3  SIMULTANEOUS  ESTIMATION  OF  MEAN  RESPONSES 

Often  one  would  like  to  estimate  the  mean  responses  at  a  number  of  X  levels 
from  the  same  sample  data.  The  Westwood  Company,  for  instance,  may  wish  to 
estimate  the  mean  number  of  man-hours  for  lots  of  30,  55,  and  80  parts.  We 
already  know  how  to  do  this  for  any  one  level  of  X  with  given  statement  confi¬ 
dence  coefficient.  Now  we  shall  discuss  two  approaches  for  simultaneous  esti¬ 
mation  of  mean  responses  with  a  family  confidence  coefficient,  so  that  there  is  a 
known  assurance  of  all  estimates  of  mean  responses  being  correct.  The  two 
approaches  are  the  Working-Hotelling  approach  and  the  Bonferroni  approach. 

The  reason  for  concern  with  a  family  confidence  coefficient  is  that  separate 
interval  estimates  of  E(Yh)  at  various  levels  Xh  need  not  all  be  correct  or  all  be 
incorrect,  even  though  they  are  all  based  on  the  same  sample  data  and  fitted 
regression  line.  The  combination  of  sampling  errors  in  b0  and  b1  may  be  such 
that  the  interval  estimates  of  E(Yh)  will  be  correct  over  some  range  of  X  levels 
and  incorrect  elsewhere. 


Working-Hotelling  approach 

Since  the  Working-Hotelling  confidence  band  for  the  entire  regression  line 
holds  for  all  values  of  X,  it  certainly  must  hold  for  selected  levels  of  the  inde¬ 
pendent  variable.  Hence,  to  obtain  with  the  Working -Hotelling  approach  a  fam¬ 
ily  of  interval  estimates  of  mean  responses  at  different  levels  of  X,  with  a  1  —  a 
family  confidence  coefficient,  we  simply  use  formula  (5.9)  repetitively  to  calcu¬ 
late  boundary  points  of  the  confidence  band  at  the  various  X  levels  being  consid¬ 
ered.  These  boundary  points  serve  then  as  the  family  confidence  limits  for  the 
interval  estimates  of  the  mean  responses  of  interest. 

Example.  For  the  Westwood  Company  lot  size  example,  suppose  that  we 
require  a  family  of  estimates  of  the  mean  number  of  man-hours  at  the  following 


158  /  Simultaneous  inferences  and  other  topics — I 


levels  of  lot  size:  30,  55,  80.  The  family  confidence  coefficient  is  to  be  .90.  We 
obtained  earlier  Yh  and  s(Yh)  for  Xh  =  55,  and  found  W  =  2.494.  In  similar 
fashion,  we  can  obtain  the  needed  results  for  the  other  lot  sizes.  We  summarize 
them  here,  without  showing  the  calculations: 


Xh 

Yh 

VW„) 

30 

70.0 

1.27764 

3.1864 

55 

120.0 

.89730 

2.2379 

80 

170.0 

1.65387 

4.1248 

Thus,  the  boundary  points  of  the  regression  line  band  at  Xh  =  30,  55,  and  80  are: 

66.8  =  70.0  -  3.1864  <  E(Y30)  <  70.0  +  3.1864  =  73.2 

117.8  =  120.0  -  2.2379  <  E(Y55)  <  120.0  +  2.2379  =  122.2 

165.9  =  170.0  -  4.1248  <  E(Y80)  <  170.0  +  4.1248  =  174.1 

With  family  confidence  coefficient  .90,  we  conclude  that  the  mean  number  of 
man-hours  for  lots  of  30  parts  is  between  66.8  and  73.2,  for  lots  of  55  parts  is 
between  117.8  and  122.2,  and  for  lots  of  80  parts  is  between  165.9  and  174.1. 
The  family  confidence  coefficient  .90  provides  assurance  that  the  procedure 
leads  to  all  correct  estimates  in  the  family  of  estimates. 


Bonferroni  approach 

The  Bonferroni  approach,  discussed  earlier  for  simultaneous  estimation  of  /30 
and  (3i,  is  a  completely  general  approach.  To  construct  a  family  of  confidence 
intervals  for  mean  responses  at  different  X  levels,  we  calculate  the  usual  confi¬ 
dence  limits  for  a  single  mean  response  E(Yh),  given  in  (3.32),  and  adjust  the 
statement  confidence  coefficient  to  yield  the  specified  family  confidence  coeffi¬ 
cient. 

If  E(Yh)  is  to  be  estimated  for  g  levels  of  X,  with  a  family  confidence  coeffi¬ 
cient  of  1  -  a,  the  Bonferroni  confidence  limits  are: 

(5.10)  Yh±Bs(Yh) 

where: 

(5.10a)  B  =  t{  1  —  a!2g\  n  —  2) 

g  is  the  number  of  confidence  intervals  in  the  family 


Example.  The  estimates  of  the  mean  man-hours  for  lot  sizes  of  30,  55,  and 
80  parts  with  a  family  confidence  coefficient  of  .90  by  the  Bonferroni  approach 
require  the  same  data  as  the  Working-Hotelling  approach  presented  above.  In 
addition,  we  require  B  =  t(l  —  .10/2(3);  8)  =  r( . 9 8 3 ;  8).  By  linear  interpola¬ 
tion,  we  obtain  /(.983;  8)  =  2.56  (see  the  following  Comment  4). 

We  thus  obtain  the  confidence  intervals,  with  a  90  percent  family  confidence 
coefficient: 


5.4  Simultaneous  prediction  intervals  for  new  observations  /  159 


66.7  =  70.0  -  2.56(1.27764)  <  E(Y30)  <  70.0  +  2.56(1.27764)  =  73.3 

117.7  =  120.0  -  2.56(. 89730)  <  E(YSS )  <  120.0  +  2.56(.89730)  -  122.3 

165.8  =  170.0  -  2.56(1.65387)  <  £(T80)  <  170.0  +  2.56(1.65387)  =  174.2 


Comments 


1 .  In  this  instance  the  Working-Hotelling  confidence  limits  are  slightly  tighter  than  the 
Bonferroni  limits.  In  other  cases  where  the  number  of  statements  is  small,  the  Bonferroni 
limits  may  be  tighter.  For  larger  families,  the  Working-Hotelling  confidence  limits  will 
always  be  the  tighter,  since  W  in  (5.9a)  stays  the  same  for  any  number  of  statements  in  the 
family  whereas  B  in  (5.10a)  becomes  larger  as  the  number  of  statements  increases.  In 
practice,  once  the  family  confidence  coefficient  has  been  decided  upon,  one  can  calculate 
the  W  and  B  multiples  to  determine  which  procedure  leads  to  tighter  confidence  limits. 

2.  Both  the  Working-Hotelling  and  Bonferroni  approaches  to  multiple  estimation  of 
mean  responses  provide  lower  bounds  to  the  actual  family  confidence  coefficient.  The 
reason  why  the  Working-Hotelling  approach  furnishes  a  lower  bound  is  that  the  confi¬ 
dence  coefficient  1  —  a  actually  applies  to  the  entire  line  from  —  °°  to  +°°. 

3.  Sometimes  it  is  not  known  in  advance  for  which  levels  of  the  independent  variable 
to  estimate  the  mean  response.  That  is  determined  as  the  analysis  proceeds.  In  such  cases, 
it  is  better  to  use  the  Working-Hotelling  approach. 

4.  To  obtain  an  untabled  percentile  of  the  t  distribution,  linear  interpolation  in 
Table  A-2  ordinarily  will  give  a  reasonably  close  approximation  as  long  as  the  degrees  of 
freedom  are  not  minimal.  In  our  illustration  of  the  Bonferroni  method,  we  required 
t(.983;  8).  From  Table  A-2,  we  know  that: 

f(.980;  8)  -  2.449  f(.985;  8)  =  2.634 

Linear  interpolation  therefore  gives: 


t(.983;  8)  =  2.449  + 


.983  -  .980 
.985  -  .980 


(2.634  -  2.449)  =  2.56 


5.4  SIMULTANEOUS  PREDICTION  INTERVALS 
FOR  NEW  OBSERVATIONS 

Now  we  consider  the  simultaneous  prediction  of  g  new  observations  on  Y  in  g 
independent  trials  at  g  different  levels  of  X.  To  illustrate  this  type  of  application, 
let  us  suppose  the  Westwood  Company  plans  to  produce  the  next  three  lots  in 
sizes  of  30,  55,  and  80  parts,  and  wishes  to  predict  the  man-hours  for  each  of 
these  lots  with  a  family  confidence  coefficient  of  .95. 

Two  procedures  will  be  considered  here,  the  Scheffe  procedure  and  the 
Bonferroni  procedure.  Both  utilize  the  same  type  of  limits  as  for  predicting  a 
single  observation,  given  in  (3.35),  and  only  the  multiple  of  the  estimated  stand¬ 
ard  deviation  is  changed.  The  Scheffe  procedure  uses  the  F  distribution,  while 
the  Bonferroni  procedure  uses  the  t  distribution.  The  simultaneous  prediction 
limits  for  g  predictions  with  the  Scheffe  procedure  with  family  confidence  coeffi¬ 
cient  1  —  a  are: 


(5.11) 

where: 


Yh  ±  Ss(Yh(pew)) 


160  /  Simultaneous  inferences  and  other  topics — I 


(5.11a)  S2  =  gF(l  -  a;  g,  n  -  2) 

With  the  Bonferroni  procedure,  the  1  —  a  simultaneous  prediction  limits  are: 

(5.12)  Yh±Bs(Yh(ne  w)) 

where: 

(5.12a)  B  =  t(  1  —  a!2g\  n  —  2) 

We  can  evaluate  the  S  and  B  multiples  to  see  which  procedure  provides  tighter 
prediction  limits.  For  our  example,  we  have: 

S2  =  3F(.95;  3,  8)  =  3(4.07)  =  12.21  or  S  =  3.49 
B  =  t(  1  -  .05/2(3);  8)  =  t(. 992;  8)  =  3.04 

so  that  the  Bonferroni  method  will  be  used  here.  From  earlier  results,  we  obtain 
(calculations  not  shown): 


Xh 

Yh 

^(^A(new)) 

Bs(Yh(  new)) 

30 

70.0 

3.02198 

9.18682 

55 

120.0 

2.88187 

8.76088 

80 

170.0 

3.19926 

9.72575 

and  the  simultaneous  prediction  limits  are: 

60.8  =  70.0  -  9.18682  <  F3o(new)  ^  70.0  +  9.18682  =  79.2 

111.2  =  120.0  -  8.76088  <  Y55(new)  <  120.0  4-  8.76088  =  128.8 

160.3  =  170.0  -  9.72575  <  F8o(new)  ^  170.0  +  9.72575  =  179.7 

With  family  confidence  coefficient  at  least  .95,  we  can  predict  that  the  man¬ 
hours  for  the  next  three  production  runs  all  will  be  within  the  above  limits. 

Comments 

1 .  Simultaneous  prediction  intervals  for  g  new  observations  on  Y  at  g  different  levels 
of  X  with  a  1  —  a  family  confidence  coefficient  are  wider  than  the  corresponding  single 
prediction  intervals  of  (3.35).  When  the  number  of  simultaneous  predictions  is  not  large, 
however,  the  difference  in  the  width  is  only  moderate.  For  instance,  a  single  95  percent 
prediction  interval  for  our  example  would  have  utilized  the  t  multiple  t(. 975;  8)  =  2.306, 
which  is  only  moderately  smaller  than  the  multiple  B  =  3.04  for  three  simultaneous  pre¬ 
dictions. 

2.  Note  that  both  B  and  S  become  larger  as  g  increases.  This  contrasts  with  simultane¬ 
ous  estimation  of  mean  responses  where  B  becomes  larger  but  not  W.  When  g  is  large, 
both  the  B  and  S  multiples  may  become  so  large  that  the  prediction  intervals  will  be  too 
wide  to  be  useful.  Other  simultaneous  estimation  techniques  could  then  be  considered,  as 
discussed  in  Reference  5.2. 

5.5  REGRESSION  THROUGH  THE  ORIGIN 

Sometimes  the  regression  line  is  known  to  go  through  the  origin  at  (0,  0).  This 
occurs,  for  instance,  when  X  is  units  of  output  and  Y  is  variable  cost,  so  Y  is  zero 
by  definition  when  X  is  zero.  Another  example  is  where  X  is  the  number  of 


5.5  Regression  through  the  origin  /  161 


brands  of  cigarettes  stocked  in  a  supermarket  in  an  experiment  (including  some 
supermarkets  with  no  brands  stocked)  and  Y  is  the  volume  of  cigarette  sales  in  the 
supermarket.  The  normal  error  model  for  these  cases  is  the  same  as  the  general 
model  (3.1)  except  /30  =  0: 

(5.13)  Yi  =  p1Xi+el 
where: 

/3j  is  a  parameter 
Xj  are  known  constants 
£/  are  independent  N(0,  a2) 

The  regression  function  for  model  (5.13)  is: 

(5.14)  E(Y)  =  (31X 

The  least  squares  estimator  of  (3i  is  obtained  by  minimizing: 

(5.15)  Q  =  2(yf  -  &X,)2 
with  respect  to  fii.  The  resulting  normal  equation  is: 

(5.16)  ZXtfi  -  b&i)  =  0 


leading  to  the  point  estimator: 
(5.17)  bx 


2X? 


bi  as  given  in  (5.17)  is  also  the  maximum  likelihood  estimator. 
An  unbiased  estimator  of  E(Y)  is: 

(5.18)  Y=biX 

Also,  an  unbiased  estimator  of  a2  is: 


(5.19) 


MSE  = 


s  (Y,  -  y,)2 
n  —  1 


S(r,  -  i.x,)2 


n  —  1 


The  reason  for  the  denominator  zz  —  1  is  that  only  one  degree  of  freedom  is  lost 
in  estimating  the  single  parameter  of  the  regression  equation  (5.14). 

Confidence  intervals  for  /3l5  E(Yh),  and  a  new  observation  Yh(new)  are  shown 
in  Table  5.1.  Note  that  the  t  multiple  has  n  —  1  degrees  of  freedom  here,  the 


TABLE  5.1  Confidence  limits  for  regression  through  origin 


Estimate  of —  Estimated  Variance  Confidence  Limits 


Pi 

2,.  ,  MSE 
s  w  =  sxf 

(5.20) 

bx  ±  ts(bi ) 

E(Yh. ) 

s,(f  ,  _  XlMSE 

(  h)  £ X2 

(5.21) 

Yh  ±  ts(  Yh) 

Y fe(new) 

s2(Yh(nw))  =  MSE 

1  + 

_ 1 

(5.22) 

Y h  —  ^(hfnew)) 

- 

— «  J 

where: 

t  =  t(  1  —  a/2;  n  —  1) 


162  /  Simultaneous  inferences  and  other  topics — I 


degrees  of  freedom  associated  with  MSE.  The  results  in  Table  5 . 1  are  derived  in 
analogous  fashion  to  the  earlier  results  for  our  general  model  (3.1).  Whereas  for 
the  general  case,  we  encounter  terms  (X,-  —  X)2  or  (Xh  ~  X)2,  here  we  find  Xj 
and  Xh  because  of  the  regression  through  the  origin. 


Example 

The  Charles  Plumbing  Supplies  Company  operates  12  warehouses.  In  an  at¬ 
tempt  to  tighten  procedures  for  planning  and  control,  a  consultant  studied  the 
relation  between  number  of  work  units  performed  (X)  and  total  variable  labor 
cost  (7)  in  the  warehouses  during  a  test  period.  The  data  are  given  in  Table  5.2, 
and  the  observations  are  shown  as  a  scatter  plot  in  Figure  5.5. 

Model  (5.13)  for  regression  through  the  origin  was  employed  since  Y  involves 
variable  costs  only  and  the  other  conditions  of  the  model  appeared  to  be  satisfied 
as  well.  From  Table  5.2,  we  have  2 XX i  =  894,714  and  XX2  =  190,963.  Hence: 


bi 


XXXi_ 

m 


894,714 

190,963 


=  4.68527 


and  the  estimated  regression  function  is: 

Y  =  4.68527X 


The  fitted  regression  line  is  plotted  in  Figure  5.5. 

To  illustrate  inferences  for  regression  through  the  origin,  suppose  an  interval 
estimate  of  (3X  is  desired  with  a  95  percent  confidence  coefficient.  We  obtain 
(calculations  not  shown): 


MSE  = 


2 (Yj  -  bXi)2  _  2,457.66 
n  —  1  “  11 


223.42 


TABLE  5.2  Data  for  regression  through  origin — warehouse  example 


irehouse 

i 

Work  Units 
Performed 

xt 

Variable  Labor 
Cost  {dollars) 

Yi 

Xi  Yi 

xt2 

1 

20 

114 

2,280 

400 

2 

196 

921 

180,516 

38,416 

3 

115 

560 

64,400 

13,225 

4 

50 

245 

12,250 

2,500 

5 

122 

575 

70,150 

14,884 

6 

100 

475 

47,500 

10,000 

7 

33 

138 

4,554 

1,089 

8 

154 

727 

111,958 

23,716 

9 

80 

375 

30,000 

6,400 

10 

147 

670 

98,490 

21,609 

11 

182 

828 

150,696 

33,124 

12 

160 

762 

121,920 

25,600 

Total 

1,359 

6,390 

894,714 

190,963 

5.5  Regression  through  the  origin  /  163 


FIGURE  5.5  Scatter  plot  and  fitted  regression  through  origin — 
warehouse  example 


From  Table  5.2,  we  have  XX?  =  190,963.  Hence: 

.  MSE  223.42 

s  (bi)  =  = - =  .0011700  or  s(bx)  =  .034205 

XX?  190,963 

For  a  95  percent  confidence  coefficient,  we  require  t(.915;  11)  =  2.201.  The 
confidence  limits,  by  (5.20)  in  Table  5.1,  are  bx  ±  ts(bx)  or  4.68527  ± 
2. 201  (.034205).  The  95  percent  confidence  interval  for  therefore  is: 

4.61  <  /3]  <  4.76 

Thus,  with  95  percent  confidence,  it  is  estimated  that  the  mean  of  the  distribution 
of  total  variable  labor  costs  increases  by  somewhere  between  $4.61  and  $4.76  for 
each  additional  work  unit  performed. 

Comments 

1.  In  linear  regression  through  the  origin,  there  is  no  property  of  the  form 
2(7,  —  b |X)  =  2e/  =  0.  Consequently,  the  residuals  usually  will  not  sum  to  zero  here. 
The  only  property  comes  from  the  normal  equation  (5.16),  namely,  '2Xiei  =  0. 

2.  In  interval  estimation  of  E(Yh)  or  17,(new),  note  that  the  intervals  (5.21)  and  (5.22)  in 
Table  5.1  widen,  the  further  Xh  is  from  the  origin.  The  reason  is  that  the  value  of  the  true 
regression  function  is  known  precisely  at  the  origin,  so  that  the  effect  of  the  sampling 
error  in  the  slope  b\  becomes  increasingly  important  the  farther  Xh  is  from  the  origin. 

3.  Since  only  one  regression  parameter,  /3i,  must  be  estimated  for  the  regression 
function  (5.14),  simultaneous  estimation  methods  are  not  required  to  make  a  family  of 
statements  about  several  mean  responses.  For  a  given  confidence  coefficient  1  —  a,  for¬ 
mula  (5.21)  can  be  used  repetitively  with  the  given  sample  results  to  generate  a  family  of 
statements  for  which  the  family  confidence  coefficient  is  still  1  —  a. 

4.  Like  any  other  model,  model  (5.13)  should  be  evaluated  for  aptness.  Even  when  it 
is  known  that  the  regression  function  must  go  through  the  origin,  the  function  might  not  be 


164  /  Simultaneous  inferences  and  other  topics — I 


linear  or  the  variance  of  the  error  terms  might  not  be  constant.  Often  one  cannot  be  sure  in 
advance  that  the  regression  function  goes  through  the  origin,  and  it  is  then  safe  practice  to 
use  the  general  model  (3.1).  If  the  regression  does  go  through  the  origin,  b0  will  differ 
from  0  only  by  a  small  sampling  error,  and  unless  the  sample  size  is  very  small,  use  of 
model  (3.1)  has  no  disadvantages  of  any  consequence.  If  the  regression  does  not  go 
through  the  origin,  use  of  the  general  model  (3.1)  will  avoid  potentially  serious  difficul¬ 
ties  resulting  from  forcing  the  regression  through  the  origin  when  this  is  not  appropriate. 

5.6  EFFECT  OF  MEASUREMENT  ERRORS 

In  our  discussion  of  the  regression  model  up  to  this  point,  we  have  not  explic¬ 
itly  considered  the  presence  of  measurement  errors  in  either  X  or  7.  We  now 
examine  briefly  the  effect  of  measurement  errors. 

Measurement  errors  in  Y 

If  random  measurement  errors  are  present  in  the  dependent  variable  Y,  no  new 
problems  are  created  if  these  errors  are  uncorrelated  and  not  biased  (positive  and 
negative  measurement  errors  tend  to  cancel  out).  Consider,  for  example,  a  study 
of  the  relation  between  the  time  required  to  complete  a  task  (7)  and  the  complex¬ 
ity  of  the  task  (X).  The  time  to  complete  the  task  may  not  be  measured  accurately 
because  the  person  operating  the  stopwatch  may  not  do  so  at  the  precise  instants 
called  for.  As  long  as  such  measurement  errors  are  of  a  random  nature,  uncorre¬ 
lated,  and  not  biased,  these  measurement  errors  can  simply  be  absorbed  in  the 
model  error  term  s.  The  model  error  term  reflects  the  composite  effects  of  a  large 
number  of  factors  not  considered  in  the  model,  ,one  of  which  simply  would  be 
random  errors  due  to  inaccuracy  in  the  process  of  measuring  Y. 

Measurement  errors  in  X 

Unfortunately,  a  different  situation  holds  if  the  independent  variable  X  is 
known  only  with  measurement  error.  Frequently,  to  be  sure,  X  is  known  without 
measurement  error,  as  when  the  independent  variable  is  price  of  a  product,  num¬ 
ber  of  variables  in  an  optimization  problem,  or  wage  rate  for  a  class  of  employ¬ 
ees.  At  other  times,  however,  measurement  errors  may  enter  the  value  observed 
for  the  independent  variable,  for  instance,  when  it  is  pressure,  temperature,  pro¬ 
duction  line  speed,  or  person’s  age. 

We  shall  use  the  latter  illustration  in  our  development  of  the  nature  of  the 
problem.  Suppose  we  are  regressing  employees’  piecework  earnings  on  age.  Let 
X,  denote  the  true  age  of  the  z'th  employee  and  X*  the  age  given  by  the  employee 
on  his  or  her  employment  record.  Needless  to  say,  the  two  are  not  always  the 
same.  We  define  the  measurement  error  8,  as  follows: 

(5.23)  Si  =  Xf-Xi 

The  regression  model  we  would  like  to  study  is: 

(5.24)  7,  =  (30  +  P&  +  £t 


5.6  Effect  of  measurement  errors  /  165 


Since,  however,  we  only  observe  Xf,  model  (5.24)  becomes: 

(5.25)  Yi  =  p0  +  Pi  Af  ~  8d  +  et 

where  we  make  use  of  (5.23)  in  replacing  Xh  We  can  rewrite  (5.25)  as  follows: 

(5.26)  Yi  =  p0  +  Pi  Xf  +  (e,-  -  PA) 

Model  (5.26)  may  appear  like  an  ordinary  regression  model,  with  independent 
variable  X*  and  error  term  s  —  Pid,  but  it  is  not.  The  independent  variable 
observation  Xf  is  a  random  variable,  which,  as  we  shall  see,  is  correlated  with 
the  error  term  e{  —  p)8j.  Theorem  (3.40)  for  the  case  of  random  independent 
variables  requires  that  the  error  term  be  independent  of  the  independent  variable. 
Hence,  the  standard  regression  results  are  not  applicable  for  model  (5.26). 

Intuitively,  we  know  that  e,  —  (3 A  is  not  independent  of  Xf  since  (5.23) 
constrains  Xf  —  S,  to  equal  Xt.  To  determine  the  dependence  formally,  let  us 


assume: 

(5.27a) 

'oo 

II 

o 

(5.27b) 

ii 

o 

(5.27c) 

<y> 

To 

II 

o 

Note  that  (5.27a)  implies  that  E(Xf)  =  E(X(  +  §,-)  —  Xh  and  that  (5.27c)  as¬ 
sumes  the  measurement  error  5,-  is  not  correlated  with  the  model  error  s,  because 
by  (1.19a)  we  have  cr(6,-,  ef)  =  E(8iEj)  since  EA)  =  E(ef)  =  0  by  (5.27a)  and 
(5.27b).  We  now  wish  to  find  the  covariance: 

o<X?,  St  -  PA)  =  E{[Xf  -  E{Xfmei  -  PA)  -  EA  -  PA)1} 

=  E[(Xf  -  Xd(Bi  -  PA)] 

=  EAA  -  PA)] 

=  EAst  -  Pidf) 

Now  EAsd  =  0  by  (5.27c),  and  E(8f)  =  <t2(6;)  by  (1.14a)  because  EA)  =  0 
by  (5.27a).  We  therefore  obtain: 

(5.28)  a(X*,  s,  -  M)  =  — /3i<x2(S,) 

This  covariance  is  not  zero  if  there  is  a  linear  regression  relation  between  X 
and  Y. 

If  standard  least  squares  procedures  are  applied  to  model  (5.26),  the  estima¬ 
tors  b0  and  bl  are  biased  and  also  lack  the  property  of  consistency.  Great  difficul¬ 
ties  are  encountered  in  developing  unbiased  estimators  when  there  are  measure¬ 
ment  errors  in  X.  One  approach  is  to  impose  severe  conditions  on  the 
problem — for  example,  to  make  fairly  strong  assumptions  about  the  properties  of 
the  distributions  of  8t,  the  covariance  of  <5,  and  e7,  and  so  on.  Another  approach  is 
to  use  additional  variables  which  are  known  to  be  related  to  the  true  value  of  X 
but  not  with  the  errors  of  measurement  8.  Such  variables  are  called  “instrumen¬ 
tal”  variables  because  they  are  used  as  an  instrument  in  studying  the  relation 


166  /  Simultaneous  inferences  and  other  topics — I 


between  X  and  Y.  Instrumental  variables  make  it  possible  to  obtain  consistent 
estimators  of  the  regression  parameters. 

Discussions  of  possible  approaches  and  further  references  will  be  found  in 
specialized  works  such  as  Reference  5.3  and  in  statistical  journals. 

Note 

It  may  be  asked  what  is  the  distinction  between  the  case  when  X  is  a  random  variable, 
considered  in  Chapter  3,  and  the  case  when  X  is  subject  to  random  measurement  errors, 
and  why  are  there  special  problems  with  the  latter.  When  X  is  a  random  variable,  it  is  not 
under  the  control  of  the  analyst  and  will  vary  at  random  from  trial  to  trial,  as  when  X  is 
the  number  of  persons  entering  a  store  in  a  day.  If  this  random  variable  X  is  not  subject  to 
measurement  errors,  however,  it  can  be  accurately  ascertained  for  a  given  trial.  Thus,  if 
there  are  no  measurement  errors  in  counting  the  number  of  persons  entering  a  store  in  a 
day,  the  analyst  has  accurate  information  to  study  the  relation  between  number  of  persons 
entering  the  store  and  sales,  even  though  the  levels  of  number  of  customers  which  actually 
occur  cannot  be  controlled.  If,  on  the  other  hand,  measurement  errors  are  present  in  the 
number  of  persons  entering  the  store,  a  distorted  picture  of  the  relation  between  number  of 
persons  and  sales  occurs  because  the  sales  observations  will  frequently  be  matched  against 
an  incorrect  number  of  customers.  This  distorting  effect  of  measurement  errors  is  present 
whether  X  is  fixed  or  random. 


Berkson  model 

There  is  one  situation  where  measurement  errors  in  X  are  no  problem.  This 
case  was  first  noted  by  Berkson  (Ref.  5.4).  Frequently,  the  independent  variable 
in  an  experiment  is  set  at  a  target  value.  For  instance,  in  an  experiment  on  the 
effect  of  temperature  on  typist  productivity,  the  temperature  may  be  set  at  target 
levels  of  68°F,  70°F,  and  so  on,  according  to  the  temperature  control  on  the 
thermostat.  The  observed  temperature  Xf  is  fixed  here,  while  the  actual  tempera¬ 
ture  Xt  is  a  random  variable  since  the  thermostat  may  not  be  completely  accurate. 
Similar  situations  exist  when  water  pressure  is  set  according  to  a  gauge,  or 
employees  of  specified  ages  according  to  their  employment  records  are  selected 
for  a  study. 

In  all  of  these  cases,  the  observation  Xf  is  a  fixed  quantity,  while  the  unob¬ 
served  true  value  Xt  is  a  random  variable.  The  measurement  error  is,  as  before: 

(5.29)  Si  =  Xf  -  Xt 

Here,  however,  there  is  no  constraint  on  the  relation  between  Xf  and  <5,-,  since  Xf 
is  a  fixed  quantity.  Again,  we  assume  that  E(8{)  =  0. 

Model  (5.26),  which  we  obtained  when  replacing X,-  byXf  —  8h  is  still  appli¬ 
cable  for  the  Berkson  case: 

(5.30)  Yt  =  j30  +  frXf  +  (Bi  ~  (BA) 

The  expected  value  of  the  error  term,  E{et  —  (BAi),  is  zero  as  before,  since 
E{et)  =  0  and  £(§,)  =  0.  Further,  et  —  (BA  is  now  independent  of  Xf ,  since  Xf 


5.7  Weighted  least  squares  /  167 


is  a  constant  for  the  Berkson  case.  Hence,  the  conditions  of  an  ordinary  regres¬ 
sion  model  are  met: 

1.  The  error  terms  have  expectation  0. 

2.  The  independent  variable  is  a  constant,  and  hence  the  error  terms  are  inde¬ 
pendent  of  it. 

Thus,  standard  least  squares  procedures  can  be  applied  for  the  Berkson  case 
without  modification,  and  the  estimators  b0  and  b\  will  be  unbiased.  If  we  can 
make  the  standard  normality  and  constant  variance  assumptions  for  the  errors 
Si  —  /3i<5,,  the  usual  tests  and  interval  estimates  can  be  utilized. 


5.7  WEIGHTED  LEAST  SQUARES 


General  approach 

The  least  squares  criterion  (2.8): 

Q  =  2  W  -  ft  -  AX)2 

i=l 

weights  each  observation  equally.  There  are  times,  however,  when  some 
observations  should  receive  greater  weight  and  others  smaller  weight.  The 
weighted  least  squares  criterion  for  simple  linear  regression  is: 


(531), 


Q„. 


n 


2  wi(yi  -  ft  -  |3A) 

i=  1 


2 


where  w(  is  the  given  weight  of  the  ith  observation.  Minimizing  Qw  with  respect 
to  /3o  and  /3i  leads  to  the  normal  equations: 

Evty-U  =  bo^w,  +  b{Z\ViXi 
j  Sw/XfTi  =  b^ZwiXi  +  b^Xj 


In  turn,  these  can  be  solved  for  the  weighted  least  squares  estimators  b0  and  b\ : 


(5.33a) 


'ZwiXPi 


'LwiX^WiYt 


'Zwj 


'ZwiXf  — 


(5.33b) 


bo  = 


Ewy-F;  -  b&WiXi 


Ew,- 


Note  that  if  all  weights  are  equal  so  wt  is  identically  equal  to  a  constant,  the 
normal  equations  (5.32)  for  weighted  least  squares  reduce  to  the  ones  for  un¬ 
weighted  least  squares  in  (2.9)  and  the  weighted  least  squares  estimators  (5.33) 
reduce  to  the  ones  for  unweighted  least  squares  in  (2.10). 


168  /  Simultaneous  inferences  and  other  topics — I 


Weighted  least  squares  should  be  used  when  the  error  term  variance  is  not 
constant  for  all  observations.  The  weight  for  an  observation  should  then  be  the 
reciprocal  of  the  observation’s  error  term  variance: 

(5.34)  w,  = 

where  of  is  the  variance  of  the  error  term  for  the  zth  observation.  Thus,  observa¬ 
tions  whose  error  terms  are  subject  to  large  variation  receive  less  weight  and 
observations  whose  error  terms  are  subject  to  small  variation  receive  more 
weight. 

Unfortunately,  the  error  term  variances  of  are  usually  unknown.  However, 
when  the  error  term  variance  varies  with  the  level  of  the  independent  variable  in 
a  systematic  fashion,  this  relation  can  be  exploited.  For  instance,  if  the  error  term 
variance  of  is  proportional  to  Xj  so  that  of  =  kXj  where  k  is  a  proportionality 
factor,  the  weights  wt  would  be: 


(5.35) 


wf  = 


1 


kXf 


Since  the  proportionality  constant  k  drops  out  of  the  normal  equations  (5.32),  we 
can  simply  use  the  weights: 


(5.35a)  •  ”i  =  -k 

X-i 

This  particular  weight  relation  of  the  error  term  variance  being  proportional  to  X2 
is  frequently  encountered  in  business,  economic,  and  biological  applications,  as 
in  studies  of  savings  regressed  on  income. 

We  illustrate  these  concepts  by  an  example. 


Example 

The  Nielsen  Construction  Company  studied  the  relation  between  the  size  of  a 
bid  in  million  dollars  (X)  and  the  cost  to  the  firm  of  preparing  the  bid  in  thousand 
dollars  (Y),  for  12  recent  bids.  The  data  are  presented  in  Table  5.3,  columns  1 
and  2,  and  are  shown  in  a  scatter  plot  in  Figure  5.6.  The  scatter  plot  strongly 
suggests  that  the  error  variance  increases  with  X.  An  analyst  after  conducting  a 
preliminary  residual  analysis  concluded  that  the  error  variance  is  approximately 
proportional  to  X2.  (See  the  following  Comment  4  for  a  discussion  of  how  to 
study  the  relation  between  the  error  variance  and  the  level  of  X.)  The  analyst, 
therefore,  decided  to  employ  the  regression  model: 

(5.36)  Yt  =  (30  +  /3i Xi  +  Si  where  of  =  kX2 

Column  3  in  Table  5.3  contains  the  weights  =  MX2,  and  columns  4-6 
contain  the  needed  calculations.  The  least  squares  estimators  (5.33a)  and  (5.33b) 
are,  using  the  data  in  Table  5.3  and  noting  that  wtX2  =  1  here  so  that  2wyXf 
=  n  —  12: 


5.7  Weighted  least  squares  /  169 


TABLE  5.3  Regression  calculations  for  weighted  least  squares — bid  preparation 
example 


(1) 

(2) 

(3) 

1 

W;  =  — 

X? 

(4) 

1 

WiXj  =  — 

Xi 

(5) 

(6) 

Xi 

x, 

w,Y, 

II 

>7  |  * 

2.13 

15.5 

.220415 

.46948 

3.41643 

7.27700 

1.21 

11.1 

.683013 

.82645 

7.58144 

9.17355 

11.00 

62.6 

.008264 

.09091 

.51733 

5.69091 

6.00 

35.4 

.027778 

.16667 

.98334 

5.90000 

5.60 

24.9 

.031888 

.17857 

.79401 

4.44643 

6.91 

28.1 

.020943 

.14472 

.58850 

4.06657 

2.97 

15.0 

.113367 

.33670 

1.70051 

5.05051 

3.35 

23.2 

.089107 

.29851 

2.06728 

6.92537 

10.39 

42.0 

.009263 

.09625 

.38905 

4.04235 

1.10 

10.0 

.826446 

.90909 

8.26446 

9.09091 

4.36 

20.0 

.052605 

.22936 

1.05210 

4.58716 

8.00 

47.5 

.015625 

.12500 

.74219 

5.93750 

Total  63.02 

335.3 

2.098714 

3.87171 

28.09664 

72.18826 

72.18826  - 


b.  = 


bn  = 


(3.87171)(28.09664) 

2.098714 


12 


(3.87171)" 


4.19057 


2.098714 
28.09664  -  4.19057(3.87171) 
2.098714 


=  5.65678 


Hence,  the  fitted  regression  line  is: 

Y=  5.6568  +  4.1906X 


FIGURE  5.6  Scatter  plot  and  fitted  weighted  regression  line — bid  preparation  example 


170  /  Simultaneous  inferences  and  other  topics — I 


The  fitted  regression  line  is  shown  in  Figure  5.6  and  appears  to  be  a  reasonably 
good  fit  to  the  data. 

If  we  had  fitted  a  straight  line  by  unweighted  least  squares  to  the  original  data 
in  our  example,  we  would  have  obtained  the  regression  function: 

Y  =  4.2289  +  4.5153Z 

This  differs  from  the  weighted  least  squares  results,  as  will  generally  be  the  case. 
The  reason  is  that  the  weights  wt  =  MXj  give  more  emphasis  to  observations  for 
smaller  X  (for  which  the  distribution  of  Y  has  a  smaller  variance)  and  less  empha¬ 
sis  to  observations  for  larger  X  (for  which  the  distribution  of  Y  has  a  larger 
variance) . 

While  the  estimates  obtained  by  unweighted  least  squares  are  unbiased,  as  are 
the  estimates  obtained  by  weighted  least  squares,  the  unweighted  least  squares 
estimates  are  subject  to  greater  sampling  variation.  In  our  example,  the  estimated 
standard  deviations  of  the  regression  coefficients  for  the  two  methods  are: 

Unweighted  Weighted 

Least  Squares  Least  Squares 


s(b0)  =  3.2517  s(b0)  =  .9652 

s(bx)  =  .5285  s(bt)  =  .4037 


Comments 

1 .  The  condition  of  the  error  variance  not  being  constant  over  all  observations  is  called 
heteroscedasticity,  in  contrast  to  the  condition  of  equal  error  variances,  called  homo- 
scedasticity . 

2.  When  heteroscedasticity  prevails  but  the  other  conditions  of  model  (3.1)  are  met, 
the  estimators  b0  and  bx  obtained  by  ordinary  least  squares  procedures  are  still  unbiased 
and  consistent,  but  they  are  no  longer  minimum  variance  unbiased  estimators,  as  illus¬ 
trated  in  the  previous  example. 

3.  Heteroscedasticity  is  inherent  when  the  response  in  regression  analysis  follows  a 
distribution  in  which  the  variance  is  functionally  related  to  the  mean.  (Significant 
nonnormality  in  Y  is  encountered  as  well  in  most  such  cases.)  Consider,  in  this  connec¬ 
tion,  a  regression  analysis  where  X  is  the  speed  of  a  machine  which  puts  a  plastic  coating 
on  cable  and  Y  is  the  number  of  blemishes  in  the  coating  per  thousand  feet  drum  of  cable. 
If  Y  is  Poisson  distributed  with  a  mean  which  increases  as  X  increases,  the  distributions  of 
Y  cannot  have  constant  variance  at  all  levels  of  X  since  the  variance  of  a  Poisson  variable 
equals  the  mean,  which  is  increasing  with  X. 

4.  Replicate  observations  at  several  of  the  X  levels  are  very  desirable  for  obtaining 
information  about  the  nature  of  the  relation  of  the  error  variance  to  X.  When  replicates  are 
not  available,  one  may  divide  the  residuals  into,  say,  three  or  four  groups  of  approxi¬ 
mately  equal  size,  according  to  ascending  or  descending  order  ofX,  calculate  the  variance 
of  the  residuals  for  each  group,  and  examine  the  relation  of  these  variances  to  various 
functions  of  X,  such  as  VZ,  X,  and  X2.  This  rough  analysis  frequently  will  be  an  ade¬ 
quate  guide  to  decide  about  the  approximate  relation  between  the  error  variance  and  the 
level  of  X. 


5.7  Weighted  least  squares  /  171 


Weighted  least  squares  by  means  of  transformations 


Many  regression  packages  allow  the  user  to  perform  a  weighted  least  squares 
analysis  as  an  option,  with  the  user  specifying  the  weights.  When  this  option  is 
not  available,  weighted  least  squares  estimators  can  still  be  obtained  by  employ¬ 
ing  unweighted  least  squares  on  specially  transformed  observations. 

To  illustrate  that  weighted  least  squares  is  equivalent  to  unweighted  least 
squares  of  specially  transformed  data,  we  consider  again  the  weighted  least 
squares  criterion  (5.31)  for  the  case  crj  =  kXj  where  wt  =  l/Xj: 


Qw  =  2w,<y,  -  ft  -  ftv-)2 
=  2-W,  -  ft  -  ft*,-)2 

Xf 

so  that: 

(5'37)  ^  f  '  4 

We  shall  use  the  following  notation: 


(5.38) 


y ;  =  -%-  Po  —  is, 

X-i 

x'i =4"  k= 


We  can  then  express  (5.37)  as  follows: 

(5.39)  Qw  =  I(Yl  -  -  jSOT2 


which  is  the  unweighted  least  squares  criterion  for  the  observations  X-  and  Y\ 
transformed  according  to  (5.38). 

The  error  variances  for  the  model  with  the  transformed  variables  are  now 
constant.  This  can  be  seen  by  dividing  the  terms  in  the  original  model: 


Yi  =  Po  +  PiXt  +  Bi 


by  VWi  =  1  IXi. 


Yj_ 


Xt 


+  j8i  + 


Bi_ 

Xi 


so  that  we  obtain,  using  the  notation  in  (5.38): 
(5.40)  Y[  =  jS  6  +  PiXl  +  b[ 

where: 


172  /  Simultaneous  inferences  and  other  topics — I 


Note  that  (5.40)  is  the  simple  linear  regression  model  using  the  transformed 
variables.  The  variances  of  the  error  terms  e/  in  model  (5.40)  are  now  constant, 
as  can  be  seen  using  (1.15b): 

(5.41)  a2(s-)  =  =  ~^2a2(ei)  =  =  k 


Hence,  using  unweighted  least  squares  for  the  dependent  variable  Y'  =  Y/X 
and  the  independent  variable  X'  =  l/X  will  yield  the  same  results  as  a  weighted 
least  squares  analysis  with  weights  wt  =  1  IXj.  Once  the  unweighted  least  squares 
estimators  bo  and  b{  are  obtained  for  the  transformed  variables,  they  are  related 
to  the  weighted  least  squares  estimators  for  the  original  variables  using  (5.38),  so 
that  bo  =  bi  and  b\  =  b0. 

Appropriate  transformations  of  the  observations  for  other  variance  relation¬ 
ships,  e.g.,  af  =  kX,,  can  be  found  in  the  same  fashion  as  explained  here  for 
of  -  kXf. 


5.8  INVERSE  PREDICTIONS 

At  times,  a  regression  model  of  Y  on  X  is  used  to  make  a  prediction  of  the 
value  of  X  which  gave  rise  to  a  new  observation  Y.  This  is  known  as  an  inverse 
prediction.  We  illustrate  inverse  predictions  by  two  examples: 

1 .  A  trade  association  analyst  has  regressed  the  selling  price  of  a  product  (7) 
on  its  cost  (X)  for  the  15  member  firms  of  the  association.  The  selling  price 
yA(new)  for  another  firm  not  belonging  to  the  trade  association  is  known,  and  it  is 
desired  to  estimate  the  cost  Xft(neW)  for  this  firm. 

2.  A  regression  analysis  of  the  decrease  in  cholesterol  level  (7)  against 
dosage  of  a  new  drug  (X)  has  been  conducted,  based  on  observations  for  50 
patients.  A  physician  is  treating  a  new  patient  for  whom  the  cholesterol  level 
should  decrease  by  7/;(new).  It  is  desired  to  estimate  the  appropriate  dosage  level 
I/j(neW)  to  be  administered  to  bring  about  the  desired  cholesterol  level  decrease 

new)  • 

In  inverse  predictions,  model  (3.1)  is  assumed  as  before: 

(5.42)  7,  =  (30  +  P&  +  Si 

The  estimated  regression  function  based  on  n  observations  is  obtained  as  usual: 

(5.43)  Y  =  bo  +  b\X 

A  new  observation  Yh(new)  becomes  available,  and  it  is  desired  to  estimate  the 
level  Xft(new)  which  gave  rise  to  this  new  observation.  A  natural  point  estimator  is 
obtained  by  solving  (5.43)  for  X,  given  7/l(new): 

(5.44)  ^  ~  fc°  b,  #  0 

b\ 

/\ 

where  X/i(new)  denotes  the  point  estimator  of  the  new  level  X^(new).  Figure  5.7 


5.8  Inverse  predictions  /  173 


contains  a  representation  of  this  point  estimator  for  an  example  to  be  discussed 
shortly.  XWnew)  is ,  indeed,  the  maximum  likelihood  estimator  of  XMnew}  for  regres¬ 
sion  model  (3.1). 

It  can  be  shown  that  approximate  1  —  a  confidence  limits  for  XMuew)  are: 
(5.45)  Xh(uew)  ±  t(l  -  a/2;  n  -  2)s(Xh(new}) 

where: 

(5.45a)  s2(Xhinew))  = 


Example 

A  medical  researcher  in  an  experiment  employed  a  new  method  for  measuring 
low  concentration  of  galactose  (sugar)  in  the  blood  (F)  on  12  samples  containing 
known  concentrations  (X).  Altogether,  four  concentration  levels  were  used  in  the 
experiment.  Linear  regression  model  (3.1)  was  fitted  with  the  following  results: 

n  =  12  b0  =  —.100  bx  =  1.017  MSE  =  .0272 

sibi)  =  .0142  X  =  5.500  Y  =  5.492  2(Xf  -  X)2  =  135 

Y=  -.100  +  1.017X 

The  data  and  the  estimated  regression  line  are  plotted  in  Figure  5.7. 


,  ,1  ,  (Xhfnew)  ~  X)" 

1  H - 1 - - - - 

n  E(X,  -  X)2 


FIGURE  5.7  Scatter  plot  and  fitted  regression  line — calibration 
example 

Measured 

Galactose 

Concentration 


174  /  Simultaneous  inferences  and  other  topics — I 


The  researcher  first  wished  to  make  sure  that  there  is  a  linear  association 
between  the  two  variables.  A  test  of  H0:  (3 1  =  0  versus  Ha:  (3]  ^  0  utilizing  the 
test  statistic  bi!s(b{)  =  1.017/. 0142  =  71.6  was  conducted  for  a  =  .05.  Since 
r(.975;  10)  =  2.228,  1 1*  |  =  71.6  >  2.228  and  it  was  concluded  that  (3i  #  0,  or 
that  a  linear  association  exists  between  the  measured  concentration  and  the  actual 
concentration. 

The  researcher  now  wishes  to  use  the  regression  relation  for  a  new  patient  for 
whom  the  new  measurement  procedure  yielded  Yh(new)  =  6.52.  It  is  desired  to 
estimate  the  actual  concentration  XWneW)  for  this  patient  by  means  of  a  95  percent 
confidence  interval. 

Using  (5.44)  and  (5.45a),  we  obtain: 


A(new) 


$  (Xj(new)) 


6.52  -  (-.100) 
1.017 
.0272 


6.509 


(1 .017)" 


1  + 


12 


+ 


(6.509  -  5.500)2 
135 


=  .0287 


so  that  s(Xh(ne w>)  ==  V.0287  =  .1694.  We  require  t(. 975;  10)  =  2.228,  and 
using  (5.45),  we  obtain  the  confidence  limits  6.509  ±  2.228(.  1694).  Hence,  the 
95  percent  confidence  interval  is: 


6.13  <  Xft(new)  <  6.89 

Thus,  it  can  be  concluded  with  95  percent  confidence  that  the  actual  galactose 
concentration  is  between  6.13  and  6.89.  This  is  approximately  a  ±6  percent 
error,  which  was  considered  reasonable  by  the  researcher. 


Comments 


1 .  The  inverse  prediction  problem  is  also  known  as  a  calibration  problem  since  it  is 
applicable  when  inexpensive,  quick,  and  approximate  measurements  (Y)  are  related  to 
precise,  often  expensive,  and  time-consuming  measurements  ( X )  based  on  n  observations. 
The  resulting  regression  model  is  then  used  to  estimate  for  a  new  approximate  measure¬ 
ment  Yh(new)  what  is  the  precise  measurement  X/,<-new).  We  illustrated  this  use  in  our 
previous  example. 

2.  The  approximate  confidence  interval  (5.45)  is  appropriate  if  the  quantity: 


(5.46) 


[t(  1  -  all ;  n  -  2 )]2MSE 
bU(X,  -  X)2 


is  small,  say  less  than  .1.  For  our  example,  this  quantity  is: 


(2.228)2(.0272) 

(1.017)2(135) 


=  .00097 


so  that  the  approximate  confidence  interval  is  appropriate. 

3.  Simultaneous  prediction  intervals  based  on  g  different  new  observed  measure¬ 
ments  T/,(new),  with  a  1  —  a  family  confidence  coefficient,  are  easily  obtained  using 
either  the  Bonferroni  or  the  Scheffe  procedures  discussed  in  Section  5.4.  The  value  of 
f(l  —  a/2;  n  —  2)  in  (5.45)  is  replaced  by  either  B  —  t(  1  —  a/2g;  n  —  2)  or  S  = 
[gF(l  ~  a- g,n~  2)]1/2. 


5.9  Choice  of  X  levels  /  1 75 


CHOICE  OF  X  LEVELS 


When  regression  data  are  obtained  by  experiment,  the  levels  of  X  at  which 
observations  on  Y  are  to  be  taken  are  under  the  control  of  the  experimenter. 
Among  other  things,  the  experimenter  will  have  to  consider: 

1 .  How  many  levels  of  X  should  be  investigated? 

2.  What  shall  the  two  extreme  levels  be? 

3.  How  shall  the  other  levels  of  X,  if  any,  be  spaced? 

4.  How  many  observations  should  be  taken  at  each  level  of  A? 


There  is  no  single  answer  to  these  questions,  since  different  purposes  of  the 
regression  analysis  lead  to  different  answers.  The  possible  objectives  in  regres¬ 
sion  analysis  are  varied,  as  we  have  noted  earlier.  The  main  objective  may  be  to 
estimate  the  slope  of  the  regression  line,  or  in  some  cases  to  estimate  the  inter¬ 
cept.  In  many  cases,  the  main  objective  is  to  predict  one  or  more  new  observa¬ 
tions  or  to  estimate  one  or  more  mean  responses.  When  the  regression  function  is 
curvilinear,  the  main  objective  may  be  to  locate  the  maximum  or  minimum  mean 
response.  At  still  other  times,  the  main  purpose  is  to  determine  the  nature  of  the 
regression  function. 

To  illustrate  how  the  purpose  affects  the  design,  consider  the  variances  of  bQ, 
b\,  Yh,  and  Yh(new)  which  were  developed  earlier: 


(5.47) 


or2(b0)  =  a2 


2Xj 


riX(Xi  -  X)' 


_1_  X2 

7  +  2(X,  -  X)2 


(5.48) 

(5.49) 

(5.50) 


o-2(Z?i) 

o-2(Yh) 


cr 


2(X,  -  Xf 


a 


&  (f/i(new))  O" 


1  ,  (Xh-X)2 

n  2(X,  -  X)2 


1  (. Xh  -  X)2 

1  +  —  +  - ~- 

n  2(Xf  -  X)2 


The  variance  of  the  slope  is  minimized  if  2(X,-  —  X)2  is  maximized.  This  is 
accomplished  by  using  two  levels  of  X,  at  the  two  extremes  for  the  scope  of  the 
model,  and  placing  half  of  the  observations  at  each  of  the  two  levels.  Of  course, 
if  one  were  not  sure  of  the  linearity  of  the  regression  function,  one  would  be 
hesitant  to  use  only  two  levels  since  they  would  provide  no  information  about 
possible  departures  from  linearity. 

If  the  main  purpose  is  to  estimate  /3o,  the  number  and  placement  of  levels  does 
not  matter  as  long  as  X  =  0.  On  the  other  hand,  to  estimate  the  mean  response  or 
predict  a  new  observation  at  Xh,  it  is  best  to  use  X  levels  so  that  X  =  Xh.  If  a 
number  of  mean  responses  are  to  be  estimated  or  a  number  of  new  observations 
are  to  be  predicted,  it  would  be  best  to  spread  out  the  X  levels  such  that  X  is  in 
the  center  of  the  Xh  levels  of  interest. 

Although  the  number  and  spacing  of  X  levels  depends  very  much  on  the  major 


176  /  Simultaneous  inferences  and  other  topics — I 


purpose  of  the  regression  analysis,  some  general  advice  can  be  given,  at  least  to 
be  used  as  a  point  of  departure.  D.  R.  Cox,  a  well-known  statistician,  suggests  as 
follows: 


Use  two  levels  when  the  object  is  primarily  to  examine  whether  or  not .  .  .  (the 
independent  variable)  .  .  .  has  an  effect  and  in  which  direction  that  effect  is.  Use 
three  levels  whenever  a  description  of  the  response  curve  by  its  slope  and  curvature 
is  likely  to  be  adequate;  this  should  cover  most  cases.  Use  four  levels  if  further 
examination  of  the  shape  of  the  response  curve  is  important.  Use  more  than  four 
levels  when  it  is  required  to  estimate  the  detailed  shape  of  the  response  curve,  or 
when  the  curve  is  expected  to  rise  to  an  asymptotic  value,  or  in  general  to  show 
features  not  adequately  described  by  slope  and  curvature.  Except  in  these  last  cases 
it  is  generally  satisfactory  to  use  equally  spaced  levels  with  equal  numbers  of 
observations  per  level  [Ref.  5.5], 


PROBLEMS 

5.1.  When  joint  confidence  intervals  for  fa  and  (3)  are  developed  by  the  Bonferroni 
method  with  a  family  confidence  coefficient  of  90  percent,  does  this  imply  that  10 
percent  of  the  time  the  confidence  intervals  for  fa  will  be  incorrect?  That  5 
percent  of  the  time  the  confidence  interval  for  fa}  will  be  incorrect  and  5  percent  of 
the  time  that  for  /3]  will  be  incorrect?  Discuss. 

5.2.  Refer  to  Problem  3.1.  Suppose  the  student  combines  the  two  confidence  intervals 
into  a  confidence  set.  What  can  you  say  about  the  family  confidence  coefficient 
for  this  set? 

5.3.  Refer  to  Calculator  maintenance  Problems  2.16  and  3.5. 

a.  Will  b0  and  b{  tend  to  err  in  the  same  direction  or  in  opposite  directions  here? 
What  does  this  imply  about  the  tilt  of  the  elliptical  joint  confidence  region  for 
fa  and  fa  here? 

b.  Obtain  a  few  boundary  points  of  the  joint  confidence  region  for  fa  and  fa  and 
plot  the  boundary.  Use  a  95  percent  confidence  coefficient.  Interpret  your 
joint  confidence  region. 

c.  A  consultant  has  suggested  that  j30  should  be  zero  and  fa  should  equal  14.0. 
Does  your  joint  confidence  region  support  this  view? 

d.  Obtain  Bonferroni  joint  confidence  intervals  for  fa  and  fa  using  a  95  percent 
family  confidence  coefficient.  Do  these  intervals  support  the  view  of  the 
consultant  in  part  (c)? 

5.4.  Refer  to  Airfreight  breakage  Problem  2.17. 

a.  Will  bQ  and  b\  tend  to  err  in  the  same  direction  or  in  opposite  directions  here? 
What  does  this  imply  about  the  tilt  of  the  elliptical  joint  confidence  region  for 
fa  and  fa  here? 

b.  Obtain  a  few  boundary  points  of  the  joint  confidence  region  for  fa  and  (3\  and 
plot  the  boundary.  Use  a  99  percent  confidence  coefficient.  Interpret  your 
joint  confidence  region. 

c.  Obtain  Bonferroni  joint  confidence  intervals  for  fa  and  fa  using  a  99  percent 
family  confidence  coefficient.  Do  the  Bonferroni  joint  confidence  intervals 


Problems  /  177 


provide  substantially  less  precise  information  than  the  joint  confidence  region 
in  part  (b)? 

5.5.  Refer  to  Plastic  hardness  Problem  2.18. 

a.  Obtain  Bonferroni  joint  confidence  intervals  for  fi0  and  (3i  using  a  90  percent 
family  confidence  coefficient.  Interpret  your  confidence  intervals. 

b.  Are  bo  and  b\  positively  or  negatively  correlated  here?  Is  this  reflected  in  your 
joint  confidence  intervals  in  part  (a)? 

c.  What  is  the  meaning  of  the  family  confidence  coefficient  in  part  (a)? 

5.6.  Refer  to  Muscle  mass  Problem  2.23. 

a.  Obtain  Bonferroni  joint  confidence  intervals  for  fi0  and  /3,  using  a  99  percent 
family  confidence  coefficient.  Interpret  your  confidence  intervals. 

b.  Will  b0  and  bi  tend  to  err  in  the  same  direction  or  in  opposite  directions  here? 
Explain. 

c.  A  researcher  has  suggested  that  /S0  should  equal  approximately  160  and  that 
Pi  should  be  between  —1.9  and  —1.5.  Do  the  joint  confidence  intervals  in 
part  (a)  support  this  expectation? 

5.7.  Refer  to  Airfreight  breakage  Problem  2.17.  Obtain  selected  boundary  points  for 
the  90  percent  confidence  band  for  the  true  regression  line.  Plot  the  confidence 
band,  together  with  the  estimated  regression  line.  How  precisely  do  we  estimate 
the  location  of  the  true  regression  line  here? 

5.8.  Refer  to  Plastic  hardness  Problem  2.18.  Obtain  selected  boundary  points  for  the 
95  percent  confidence  band  for  the  true  regression  line.  Plot  the  confidence  band, 
together  with  the  estimated  regression  line.  How  precisely  do  we  estimate  the 
location  of  the  true  regression  line  here? 

5.9.  Refer  to  Calculator  maintenance  Problems  2.16  and  3.5. 

a.  Estimate  the  expected  number  of  minutes  spent  when  there  are  3,  5,  and  7 
machines  to  be  serviced,  respectively.  Use  interval  estimates  with  a  90  per¬ 
cent  family  confidence  coefficient  based  on  the  Working-Hotelling  approach. 

b.  Two  service  calls  for  preventive  maintenance  are  scheduled  in  which  the 
numbers  of  machines  to  be  serviced  are  4  and  7,  respectively.  A  family  of 
prediction  intervals  for  the  times  to  be  spent  on  these  calls  is  desired  with  a 
90  percent  family  confidence  coefficient.  Which  procedure,  Scheffe  or 
Bonferroni,  will  provide  tighter  prediction  limits  here? 

c.  Obtain  the  family  of  prediction  limits  required  in  part  (b)  using  the  more 
efficient  procedure. 

5.10.  Refer  to  Airfreight  breakage  Problem  2.17. 

a.  It  is  desired  to  obtain  interval  estimates  of  the  mean  number  of  broken  am¬ 
pules  when  there  are  0,  1 ,  and  2  transfers  for  the  shipment  using  a  95  percent 
family  confidence  coefficient.  Obtain  the  desired  confidence  intervals  using 
the  Working-Hotelling  approach. 

b.  Are  the  confidence  intervals  obtained  in  part  (a)  more  efficient  than 
Bonferroni  intervals  here?  Explain. 

c.  The  next  three  shipments  will  make  0,1,  and  2  transfers,  respectively.  Obtain 
prediction  limits  for  the  number  of  broken  ampules  for  each  of  these  three 
shipments  using  the  Scheffe  procedure  and  a  95  percent  family  confidence 
coefficient. 


178  /  Simultaneous  inferences  and  other  topics — I 


d.  Would  the  Bonferroni  procedure  have  been  more  efficient  in  developing  the 
prediction  intervals  in  part  (c)?  Explain. 

5.11.  Refer  to  Plastic  hardness  Problem  2.18. 

a.  Management  wishes  to  obtain  interval  estimates  of  the  mean  hardness  when 
the  elapsed  time  is  40,  50,  and  60  hours,  respectively.  Calculate  the  desired 
confidence  intervals  using  the  Bonferroni  procedure  and  a  90  percent  family 
confidence  coefficient.  What  is  the  meaning  of  the  family  confidence  coeffi¬ 
cient  here? 

b.  Is  the  Bonferroni  procedure  employed  in  part  (a)  the  most  efficient  one  that 
could  be  employed  here?  Explain. 

c.  The  next  two  test  items  will  be  measured  after  30  and  40  hours  of  elapsed 
time,  respectively.  Predict  the  hardness  for  each  of  these  two  items  using  the 
most  efficient  procedure  and  a  90  percent  family  confidence  coefficient. 

5.12.  Refer  to  Muscle  mass  Problem  2.23. 

a.  The  nutritionist  is  particularly  interested  in  the  mean  muscle  mass  for  women 
aged  45,  55,  and  65.  Obtain  joint  confidence  intervals  for  the  means  of 
interest  using  the  Working- Hotelling  procedure  and  a  95  percent  family  con¬ 
fidence  coefficient. 

b.  Is  the  Working-Hotelling  approach  the  most  efficient  one  to  be  employed  in 
part  (a)?  Explain. 

c.  Three  additional  women  aged  48,  59,  and  74  have  contacted  the  nutritionist. 
Predict  the  muscle  mass  for  each  of  these  three  women  using  the  Bonferroni 
approach  and  a  95  percent  family  confidence  coefficient. 

d.  Subsequently,  the  nutritionist  wishes  to  predict  the  muscle  mass  for  a  fourth 
woman  aged  64,  with  a  family  confidence  coefficient  of  95  percent  for  the 
four  predictions.  Will  the  three  prediction  intervals  in  part  (c)  have  to  be 
recalculated?  Would  this  also  be  true  if  the  Scheffe  method  had  been  used  in 
constructing  the  prediction  intervals? 

5.13.  A  behavioral  scientist  stated  recently:  “I  am  never  sure  whether  the  regression 
line  goes  through  the  origin.  Hence,  I  will  not  use  such  a  model.”  Comment. 

5.14.  Typographical  errors.  Shown  below  are  the  number  of  galleys  of  type  set  (A) 
and  the  dollar  cost  of  correcting  typographical  errors  (Y)  in  a  random  sample  of 
recent  printing  orders  handled  by  a  firm  specializing  in  technical  printing.  Since  Y 
involves  variable  costs  only,  an  analyst  wished  to  determine  whether  the  regres¬ 
sion  through  the  origin  model  (5.13)  is  apt  for  studying  the  relation  between  the 
two  variables. 

i:  1  2  3  4  5  6  7  8  9  10  11  12 

Xt:  7  12  10  10  14  25  30  25  18  10  4  6 

Y-.  128  213  191  178  250  446  540  457  324  177  75  107 

a.  Fit  model  (5.13)  and  state  the  estimated  regression  function. 

b.  Plot  the  estimated  regression  function  and  the  data.  Does  a  linear  regression 
function  through  the  origin  appear  to  provide  a  good  fit  here?  Comment. 

c.  In  estimating  costs  of  handling  prospective  orders,  management  has  used  a 
standard  of  $17.50  per  galley  for  the  cost  of  correcting  typographical  errors. 
Test  whether  or  not  this  standard  should  be  revised;  use  a  =  .02.  State  the 
alternatives,  decision  rule,  and  conclusion. 


Problems  /  179 


d.  Obtain  a  prediction  interval  for  the  correction  cost  on  a  forthcoming  job 
involving  10  galleys.  Use  a  confidence  coefficient  of  98  percent. 

5.15.  Refer  to  Typographical  errors  Problem  5.14.  Conduct  a  formal  test  for  lack  of  fit 
of  linear  regression  through  the  origin;  use  a  =  .01.  State  the  alternatives,  deci¬ 
sion  rule,  and  conclusion. 

5.16.  Refer  to  Grade  point  average  Problem  2.15.  Assume  that  linear  regression 
through  the  origin  model  (5.13)  is  appropriate. 

a.  Fit  model  (5.13)  and  state  the  estimated  regression  function. 

b.  Estimate  fix  with  a  95  percent  confidence  interval.  Interpret  your  interval 
estimate. 

c.  Estimate  the  mean  freshman  GPA  for  students  whose  entrance  test  score  is 
5.7.  Use  a  95  percent  confidence  interval. 

5.17.  Refer  to  Grade  point  average  Problem  5.16. 

a.  Plot  the  fitted  regression  line  and  the  data.  Does  the  linear  regression  function 
through  the  origin  appear  to  be  a  good  fit  here? 

b.  Conduct  a  formal  test  for  lack  of  fit  of  linear  regression  through  the  origin; 
use  a  —  .005.  State  the  alternatives,  decision  rule,  and  conclusion.  What  is 
the  P-value  of  the  test? 

5.18.  Refer  to  Calculator  maintenance  Problem  2.16.  Assume  that  linear  regression 
through  the  origin  model  (5.13)  is  appropriate. 

a.  Obtain  the  estimated  regression  function. 

b.  Estimate  fix  with  a  90  percent  confidence  interval.  Interpret  your  interval 
estimate. 

c.  Predict  the  service  time  on  a  new  call  in  which  six  machines  are  to  be  ser¬ 
viced.  Use  a  90  percent  prediction  interval. 

5.19.  Refer  to  Calculator  maintenance  Problem  5.18. 

a.  Plot  the  fitted  regression  line  and  the  data.  Does  the  linear  regression  function 
through  the  origin  appear  to  be  a  good  fit  here? 

b.  Conduct  a  formal  test  for  lack  of  fit  of  linear  regression  through  the  origin; 
use  a  —  .01 .  State  the  alternatives,  decision  rule,  and  conclusion.  What  is  the 
P-value  of  the  test? 

5.20.  Refer  to  Plastic  hardness  Problem  2.18.  Suppose  that  errors  arise  in  X  because 
the  laboratory  technician  is  instructed  to  measure  the  hardness  of  the  z'th  specimen 
(Yj)  at  a  prerecorded  elapsed  time  (3Q,  but  the  timing  is  imperfect  so  the  true 
elapsed  time  varies  at  random  from  the  prerecorded  elapsed  time.  Will  ordinary 
least  squares  estimates  be  biased  here?  Discuss. 

5.21.  Computer-assisted  learning.  Data  from  a  study  of  computer-assisted  learning 
by  12  students,  showing  the  total  number  of  responses  in  completing  a  lesson  (A) 
and  the  cost  of  computer  time  (F,  in  cents),  follow. 

z:  1  2  3  4  5  6  7  8  9  10  11  12 

Xf-:  16  14  22  10  14  17  10  13  19  12  18  11 

F,-:  77  70  85  50  62  70  52  63  88  57  81  54 

a.  Fit  a  linear  regression  function  by  ordinary  least  squares,  obtain  the  residuals, 
and  prepare  a  plot  of  the  residuals  against  X.  What  does  the  residual  plot 
suggest? 


180  /  Simultaneous  inferences  and  other  topics — I 


b.  Assume  that  erf  =  kXj,  and  use  weighted  least  squares  to  fit  the  linear  regres¬ 
sion  function.  Are  the  regression  coefficients  similar  to  those  obtained  in  part 
(a)  with  ordinary  least  squares?  How  do  the  standard  deviations  of  the  regres¬ 
sion  coefficients  compare? 

c.  What  transformation  of  the  variables,  if  used  with  ordinary  least  squares, 
would  have  yielded  the  same  regression  coefficients  as  weighted  least  squares 
here? 

5.22.  Refer  to  Machine  speed  Problem  4.21. 

a.  Verify  that  the  relation  cr;  =  kX,  holds  approximately  here. 

b.  Use  weighted  least  squares  to  fit  the  linear  regression  function. 

c.  Obtain  an  interval  estimate  of  /3(  with  a  95  percent  confidence  coefficient. 

d.  What  transformation  of  the  variables  when  used  with  ordinary  least  squares 
would  have  yielded  the  same  regression  coefficients  as  weighted  least  squares 
here? 

5.23.  Refer  to  Grade  point  average  Problems  2.15  and  3.4.  A  new  student  earned  a 

grade  point  average  of  3.4  in  the  freshman  year. 

a.  Obtain  a  90  percent  confidence  interval  for  the  student’s  entrance  test  score. 
Interpret  your  confidence  interval. 

b.  Is  criterion  (5.46)  met  as  to  the  appropriateness  of  the  approximate  confi¬ 
dence  interval? 

5.24.  Refer  to  Plastic  hardness  Problem  2.18.  The  measurement  of  a  new  test  item 

showed  298  Brinell  units  of  hardness. 

a.  Obtain  a  99  percent  confidence  interval  for  the  elapsed  time  before  the  hard¬ 
ness  was  measured.  Interpret  your  confidence  interval. 

b.  Is  criterion  (5.46)  met  as  to  the  appropriateness  of  the  approximate  confi¬ 
dence  interval? 


EXERCISES 

5.25.  What  simplification  takes  place  in  the  boundary  of  the  joint  confidence  region 
(5.2)  for  jS0  and  jSi  if  the  independent  variable  is  so  coded  that  X  =  0?  What 
happens  to  the  shape  of  the  boundary? 

5.26.  If  the  independent  variable  is  so  coded  that  X  =  0  and  the  normal  error  model 
(3.1)  applies,  are  b0  and  b\  independent?  Are  the  confidence  intervals  for  /30  and 
jSx  then  independent? 

5.27.  Derive  an  extension  of  the  Bonferroni  inequality  (5.6a)  for  the  case  of  three 
statements,  each  with  statement  confidence  coefficient  1  —  a. 

5.28.  Show  that  for  the  fitted  least  squares  regression  line  through  the  origin  (5.18), 

ZX,e,  =  0. 

5.29.  Show  that  Y  as  defined  in  (5.18)  for  linear  regression  through  the  origin  is  an 
unbiased  estimator  of  E{Y). 

5.30.  Derive  the  formula  for  s2(fh)  given  in  Table  5.1  for  linear  regression  through  the 
origin. 

5.31.  (Calculus  needed.)  Derive  the  weighted  least  squares  normal  equations  for  fitting 
a  linear  regression  function  when  the  variance  of  the  error  terms  is  proportional  to 


Projects  /  181 


X — i.e.,  <t2(£;)  =  kXj.  Check  your  results  by  making  appropriate  substitutions  in 
(5.32). 

5.32.  Express  the  weighted  least  squares  estimator  bi  in  (5.33a)  in  terms  of  the  devia¬ 
tions  Xt  —  Xw  and  Yt  —  Yw,  where  Xw  and  Yw  are  weighted  means. 


projects 

5.33.  Refer  to  the  SMS  A  data  set  and  Project  2.38.  Consider  the  regression  relation  of 
number  of  active  physicians  to  total  population. 

a.  Obtain  a  joint  confidence  region  for  (30  and  /3,  and  plot  it;  use  a  95  percent 
confidence  coefficient. 

b.  Plot  the  95  percent  confidence  band  for  the  regression  line.  Does  it  provide  a 
fairly  precise  indication  of  the  location  of  the  regression  line? 

5.34.  Refer  to  the  SENIC  data  set  and  Project  2.40.  Consider  the  regression  relation  of 
length  of  stay  to  infection  risk. 

a.  Obtain  a  joint  confidence  region  for  (30  and  /3,  and  plot  it;  use  a  90  percent 
confidence  coefficient. 

b.  Plot  the  90  percent  confidence  band  for  the  regression  line.  Does  it  provide  a 
fairly  precise  indication  of  the  location  of  the  regression  line? 

5.35.  Five  observations  on  Y  are  to  be  taken  when  A  =  10,  20,  30,  40,  and  50,  respec¬ 
tively.  The  true  regression  function  is  E(Y)  =  20  +  10A.  The  error  terms  are 
independent  and  normally  distributed,  with  E{et)  =  0  and  cr2(£;)  =  ,8A. 

a.  Generate  a  random  Y  observation  for  each  X  level,  and  calculate  both  the 
ordinary  and  weighted  least  squares  estimates  of  the  regression  coefficient  /3i 
in  linear  regression  model  (3.1). 

b.  Repeat  part  (a)  200  times,  generating  new  random  numbers  each  time. 

c.  Calculate  the  mean  and  variance  of  the  200  ordinary  least  squares  estimates 
of  /3i,  and  do  the  same  for  the  200  weighted  least  squares  estimates. 

d.  Do  both  the  ordinary  least  squares  and  weighted  least  squares  estimators 
appear  to  be  unbiased?  Explain.  Which  estimator  appears  to  be  more  precise 
here?  Comment. 


CITED  REFERENCES 

5.1  Gafarian,  A.  V.  “Confidence  Bands  in  Straight  Line  Regression.”  Journal  of  the 
American  Statistical  Association  59  (1964),  pp.  182-213. 

5.2  Miller,  Rupert  G.,  Jr.  Simultaneous  Statistical  Inference.  2d  ed.  New  York: 
Springer- Verlag,  1981,  pp.  114-16. 

5.3  Johnston,  J.  Econometric  Methods.  2d  ed.  New  York:  McGraw-Hill,  1971. 

5.4  Berkson,  J.  “Are  There  Two  Regressions?”  Journal  of  the  American  Statistical 
Association  45  (1950),  pp.  164-80. 

5.5  Cox,  D.  R.  Planning  of  Experiments.  New  York:  John  Wiley  &  Sons,  1958, 
pp.  141-42. 


PART  II 


General  regression  and 
correlation  analysis 


Matrix  approach  to 
simple  regression  analysis 


Matrix  algebra  is  widely  used  for  mathematical  and  statistical  analysis.  The 
matrix  approach  is  practically  a  necessity  in  multiple  regression  analysis,  since  it 
permits  extensive  systems  of  equations  and  large  arrays  of  data  to  be  denoted 
compactly  and  operated  upon  efficiently. 

In  this  chapter,  we  first  take  up  a  brief  introduction  to  matrix  algebra.  (A 
fuller  treatment  of  matrix  algebra  may  be  found  in  specialized  texts  such  as 
Reference  6.1.)  Then  we  apply  matrix  methods  to  the  simple  linear  regression 
model  discussed  in  Part  I.  While  matrix  algebra  is  not  really  required  for  simple 
regression  with  one  independent  variable,  the  application  of  matrix  methods  to 
this  case  will  provide  a  transition  to  multiple  regression  which  will  be  taken  up  in 
succeeding  chapters. 

Readers  who  are  familiar  with  matrix  algebra  may  wish  to  scan  the  introduc¬ 
tory  parts  of  this  chapter  and  focus  upon  the  parts  dealing  with  the  use  of  matrix 
methods  in  regression  analysis. 

6.1  MATRICES 

Definition  of  matrix 

A  matrix  is  a  rectangular  array  of  elements  arranged  in  rows  and  columns.  An 
example  of  a  matrix  is: 


185 


186  /  Matrix  approach  to  simple  regression  analysis 


Column 

Column 

1 

2 

Row  1 

"  6,000 

23  ~ 

Row  2 

13,000 

47 

Row  3 

11,000 

35 

The  elements  of  this  particular  matrix  are  numbers  representing  income  (column 
1)  and  age  (column  2)  of  three  persons.  The  elements  are  arranged  by  row 
(person)  and  column  (characteristic  of  person).  Thus,  the  element  in  the  first  row 
and  first  column  (6,000)  represents  the  income  of  the  first  person.  The  element  in 
the  first  row  and  second  column  (23)  represents  the  age  of  the  first  person.  The 
dimension  of  the  matrix  is  3  X  2,  i.e.,  3  rows  by  2  columns.  If  we  wanted  to 
present  income  and  age  for  1 ,000  persons  in  a  matrix  with  the  same  format  as  the 
one  earlier,  we  would  require  a  1,000  x  2  matrix. 

Other  examples  of  matrices  are: 


"l 

o" 

4 

7 

12 

16~ 

5 

10 

3 

15 

9 

8 

These  two  matrices  have  dimensions  of  2  x  2  and  2x4,  respectively.  Note  that 
we  always  specify  the  number  of  rows  first  and  then  the  number  of  columns  in 
giving  the  dimension  of  a  matrix. 

As  in  ordinary  algebra,  we  may  use  symbols  to  identify  the  elements  of  a 
matrix: 


i  =  1 
i  =  2 


j  =  1  j  =  2  2  =  3 
an  a\2  a\3 

a21  a22  a23 


Note  that  the  first  subscript  identifies  the  row  number  and  the  second  the  column 
number.  We  shall  use  the  general  notation  ay  for  the  element  in  the  ilh  row  and 
the  y'th  column.  In  our  above  example,  z  =  1,  2  and  j  =  1,  2,  3. 

A  matrix  may  be  denoted  by  a  symbol  such  as  A,  X,  or  Z.  The  symbol  is  in 
boldface  to  identify  that  it  refers  to  a  matrix.  Thus,  we  might  define  for  the 
above  matrix: 


a  n  #12  a  13 

A  = 

_a2l  a22  a23_ 

Reference  to  the  matrix  A  then  implies  reference  to  the  2  x  3  array  just  given. 
Another  notation  for  the  matrix  A  just  given  is: 

A  =  [ay]  i  =  1,  2\j  =  1,  2,  3 

which  avoids  the  need  for  writing  out  all  elements  of  the  matrix  by  stating  only 
the  general  element.  This  notation  can  only  be  used,  of  course,  when  the  ele¬ 
ments  of  a  matrix  are  symbols. 

To  summarize,  a  matrix  with  r  rows  and  c  columns  will  be  represented  either 
in  full: 


6.1  Matrices  /  187 


an 

«12  •  ' 

'  •  aV 

®21 

<222  • 

■  '  a2j 

d\c 

a2c 


(6.1) 


aij 


I arj  ci  rc\ 

or  in  the  abbreviated  form: 

(6.2)  A  =  [ciy]  i  =  1, . . .  ,r;j  =  1, . . .  ,c 

or  simply  by  a  boldface  symbol,  such  as  A. 

Comments 

1.  Do  not  think  of  a  matrix  as  a  number.  It  is  a  set  of  elements  arranged  in  an  array. 
Only  when  the  matrix  has  dimension  1  x  1  is  there  a  single  number  in  the  matrix,  in 
which  case  one  can  think  of  it  interchangeably  as  either  a  matrix  or  a  number. 

2.  The  following  is  not  a  matrix: 

14 

8 

10  15 

9  16  _ 

since  the  numbers  are  not  arranged  in  columns  and  rows. 


Square  matrix 

A  matrix  is  said  to  be  square  if  the  number  of  rows  equals  the  number  of 
columns.  Two  examples  are: 

an  ai2  ai3 

a2\  a22  a22 

a2 i  a22  a22 


4  7 
3  9 


Vector 

A  matrix  containing  only  one  column  is  called  a  column  vector  or  simply  a 
vector.  Two  examples  are: 


A  = 


4 

7 

10 


c 


Cl 

c2 

C3 

C4 

C5 


The  vector  A  is  a  3  X  1  matrix,  and  the  vector  C  is  a  5  X  1  matrix. 


188  /  Matrix  approach  to  simple  regression  analysis 


A  matrix  containing  only  one  row  is  called  a  row  vector.  Two  examples  are: 
B'  =  [15  25  50]  F'  =  [/i  /2] 

We  use  the  prime  symbol  for  row  vectors  for  reasons  to  be  seen  shortly.  Note  that 
the  row  vector  B'  is  a  1  x  3  matrix  and  the  row  vector  F'  is  a  1  X  2  matrix. 
A  single  subscript  suffices  to  identify  the  elements  of  a  vector. 


Transpose 

The  transpose  of  a  matrix  A  is  another  matrix,  denoted  by  A',  that  is  obtained 
by  interchanging  corresponding  columns  and  rows  of  the  matrix  A. 

For  example,  if: 

A 

3X2 

then  the  transpose  A'  is: 

A'  = 

2X3 

Note  that  the  first  column  of  A  is  the  first  row  of  A',  and  similarly  the  second 
column  of  A  is  the  second  row  of  A'.  Correspondingly,  the  first  row  of  A  has 
become  the  first  column  of  A',  and  so  on.  Note  that  the  dimension  of  A,  indi¬ 
cated  under  the  symbol  A,  becomes  reversed  for  the  dimension  of  A'. 

As  another  example,  consider: 

C'  =  [4  7  10] 

1X3 


c  = 

3xl 


4 

7 

10 


2  5 

7  10 

3  4 


2  7  3 

5  10  4 


Thus,  the  transpose  of  a  column  vector  is  a  row  vector,  and  vice  versa.  This  is 
the  reason  why  we  used  the  symbol  B'  earlier  to  identify  a  row  vector,  since  it 
may  be  thought  of  as  the  transpose  of  a  column  vector  B. 

In  general,  we  have: 


a  ii 


A  = 

rX  c 


[_  &r\ 

(6.3) 

fan 


A'  = 

cXr 


Ole 


die 

=  [djj]  i=  1  ,...,r;j=  l,...,c 

Row  Column 
arc  index  index 


Qr] 

=  [dji]  j  =  1  ,...,c;i=  1, . . .  ,r 

Row  Column 
arc  index  index 


6.1  Matrices  /  189 


Thus,  the  element  in  the  zth  row  and  j'th  column  in  A  is  found  in  the  j'th  row  and 
zth  column  in  A'. 


Equality  of  matrices 

Two  matrices  A  and  B  are  said  to  be  equal  if  they  have  the  same  dimension 
and  if  all  corresponding  elements  are  equal.  Conversely,  if  two  matrices  are 
equal,  their  corresponding  elements  are  equal.  For  example,  if: 


d\ 

"4" 

a2 

B  = 

7 

d-3 

3 

then  A  =  B  implies: 

cli  —  4 
a2  =  7 
a3  =  3 

Similarly,  if: 


@n 

@12 

"17 

2 

A  = 

@21 

@22 

B  = 

14 

5 

@31 

@32 

\3 

9 

then  A  =  B  implies: 

@n  =  17  @12  =  2 

a2l  =  14  d22  =  5 

@31  =  13  @32  =  9 


Regression  examples 

In  regression  analysis,  one  basic  matrix  is  the  vector  Y,  consisting  of  the  n 
observations  on  the  dependent  variable: 

~Y\~ 

Y2 

(6.4)  Y  =  • 

rtX  1 

Note  that  the  transpose  Yf  is  the  row  vector: 

(6.5)  Y'  =[Yl  Y2  •  •  •  Yn] 

lXn 

Another  basic  matrix  in  regression  analysis  is  the  X  matrix,  which  is  defined 
as  follows  for  simple  regression  analysis: 


190  /  Matrix  approach  to  simple  regression  analysis 


(6.6) 


1  Xi 

1  X2 

X  = 

»X  2 

1  x„ 


The  matrix  X  consists  of  a  column  of  l’s  and  a  column  containing  the  n  values  of 
the  independent  variable  X.  Note  that  the  transpose  of  X  is: 


(6.7) 


X' 

2  X« 


1  1 

Xi  x2 


1 

X 


n 


For  the  Westwood  Company  lot  size  example,  the  Y  and  X  matrices  are  (Table 
2.1): 


"  73" 

"l 

30" 

50 

1 

20 

. 

X  = 

• 

. 

132 

1 

60 

6.2  MATRIX  ADDITION  AND  SUBTRACTION 


Adding  or  subtracting  two  matrices  requires  that  they  have  the  same  dimen¬ 
sion.  The  sum,  or  difference,  of  two  matrices  is  another  matrix  whose  elements 
each  consists  of  the  sum,  or  difference,  of  the  corresponding  elements  of  the  two 
matrices.  Suppose: 


then: 


Similarly: 


"1 

4 

“l 

2 

A  = 

2 

5 

B  = 

2 

3 

3x2 

3 

6 

3X2 

3 

4_ 

1  +  1 

4  +  2' 

“2 

6" 

A  +  B  = 

2  +  2 

5  +  3 

= 

4 

8 

3X2 

_  3  +  3 

6  +  4_ 

6 

10_ 

"1-1  4-2" 

"0  2" 

A  -  B  = 

2-2  5-3 

— 

0  2 

3X2 

3-3  6-4 

0  2 

In  general,  if: 

A  =  [fly]  B  =  [bq\ 


i  =  1  1  ,...,c 


6.2  Matrix  addition  and  subtraction  /  191 


then: 

(6.8)  A  +  B  =  [fly  +  by]  and  A  —  B  =  [fly  —  by] 

rXc  rXc 

Formula  (6.8)  generalizes  in  an  obvious  way  to  addition  and  subtraction  of  more 
than  two  matrices.  Note  also  that  A  +  B  =  B  +  A,  as  in  ordinary  algebra. 


Regression  example 

The  regression  model: 

Yi  =  E{Yi)  +  St  i=l,...,n 

can  be  written  compactly  in  matrix  notation.  First,  let  us  define  the  vector  of 
mean  responses: 


(6.9) 


E(Y) 

nx  1 


'Em 

E(Y2) 


E{Yn) 


and  the  vector  of  the  error  terms: 


(6.10) 


E 

nX  1 


£i 

£2 


e 


n 


Recalling  the  definition  of  the  Y  observation  vector  (6.4),  we  can  write  the 
regression  model  as  follows: 


Y  =  E(Y)  +  e 


because: 


~Em 

£i 

~Em  +  £i" 

Y2 

= 

Em 

+ 

£2 

= 

Em  +  £2 

Yn, 

E(Yn)_ 

£»_ 

_E(Yn)  +  en_ 

Thus,  the  observations  vector  Y  equals  the  sum  of  two  vectors,  a  vector  contain¬ 
ing  the  expected  values  and  another  containing  the  error  terms. 


192  /  Matrix  approach  to  simple  regression  analysis 


6.3  MATRIX  MULTIPLICATION 


Multiplication  of  a  matrix  by  a  scalar 


A  scalar  is  an  ordinary  number  or  a  symbol  representing  a  number.  We  fre¬ 
quently  encounter  multiplication  of  a  matrix  by  a  scalar.  In  this,  every  element  of 
the  matrix  is  multiplied  by  the  scalar.  For  example,  suppose  the  matrix  A  is  given 
by: 


2 


9 


7 

3 


Then  4A,  where  4  is  the  scalar,  equals: 


Similarly,  AA  equals: 


4A  = 


AA  = 


4 


A 


"2 

7" 

"  8 

28" 

9 

3 

36 

12 

"2 

7" 

2A 

7A~ 

9 

3 

9A 

3A 

where  A  denotes  the  scalar. 

If  every  element  of  a  matrix  has  a  common  factor,  this  factor  can  be  taken 
outside  the  matrix  and  treated  as  a  scalar.  For  example: 


Similarly: 


9  27 

15  18 


9 

6 


5_  _2 

A  A 

3_  _8 

A  A 


J_  5  2 
A  3  8 


In  general,  if  A  =  [fly]  and  A  is  a  scalar,  we  have: 


(6.11) 


AA  =  AA  =  [Afly] 


Multiplication  of  a  matrix  by  a  matrix 


Multiplication  of  a  matrix  by  a  matrix  may  appear  somewhat  complicated  at 
first,  but  a  little  practice  will  make  it  into  a  routine  operation. 

Consider  the  two  matrices: 


A 

2X2 


2  5 
4  1 


B 

2X2 


4  6 

5  8 


The  product  AB  will  be  a  2  X  2  matrix  whose  elements  are  obtained  by  finding 
the  cross  products  of  rows  of  A  with  columns  of  B  and  summing  the  cross 


6.3  Matrix  multiplication  /  193 


products.  For  instance,  to  find  the  element  in  the  first  row  and  first  column  of  the 
product  AB,  we  work  with  the  first  row  of  A  and  the  first  column  of  B,  as 
follows: 


A  B 


Row  1 

2 

5 

4 

6 

Row  2 

4 

1 

5 

8 

Column  Column 
1  2 


AB 


Row  1 


33 


Column 

1 


We  take  the  cross  products  and  sum: 

2(4)  +  5(5)  =  33 


The  number  33  is  the  element  in  the  first  row  and  first  column  of  the  matrix  AB. 

To  find  the  element  in  the  first  row  and  second  column  of  AB,  we  work  with 
the  first  row  of  A  and  the  second  column  of  B: 


A  B  AB 


Row  1 

2 

5 

4 

6 

Row  1 

~33 

52" 

Row  2 

4 

1 

5 

8 

Column  Column  Column  Column 

12  12 


The  sum  of  the  cross  products  is: 

2(6)  +  5(8)  =  52 


Continuing  this  process,  we  find  the  product  AB  to  be: 


AB 

2x2 


"2  5~ 

~4  6~ 

"33  52" 

4  1 

5  8 

21  32 

Let  us  consider  another  example: 


A 

2X3 


1  3  4 

0  5  8 


AB 

2X1 


1  3  4 

0  5  8 


B 

3X1 


3 

5 

2 


3 

5 

_  2 

26 

41 


When  obtaining  the  product  AB,  we  say  that  A  is  postmultiplied  by  B  or  B  is 
premultiplied  by  A.  The  reason  for  this  precise  terminology  is  that  multiplication 
rules  for  ordinary  algebra  do  not  apply  to  matrix  algebra.  In  ordinary  algebra, 
xy  =  yx.  In  matrix  algebra,  AB  BA  usually.  In  fact,  even  though  the  product 
AB  may  be  defined,  the  product  BA  may  not  be  defined  at  all. 


194  /  Matrix  approach  to  simple  regression  analysis 

In  general,  the  product  AB  is  only  defined  when  the  number  of  columns  in  A 
equals  the  number  of  rows  in  B  so  that  there  will  be  corresponding  terms  in  the 
cross  products.  Thus,  in  our  previous  two  examples,  we  had: 

Equal  Equal 


A/  \B  =  AB  M  \B  =  AB 

2x2  2x2  2x2  2x3  3x1  2x1 

\  /  \  / 

Dimension  Dimension 

of  of 

product  product 

Note  that  the  dimension  of  the  product  AB  is  given  by  the  number  of  rows  in  A 
and  the  number  of  columns  in  B.  Note  also  that  in  the  second  case  the  product 
BA  would  not  be  defined  since  the  number  of  columns  in  B  is  not  equal  to  the 
number  of  rows  in  A: 


Unequal 


3x1  2x3 

Here  is  another  example  of  matrix  multiplication: 


an  a12 

a13 

bn 

b\ 2 

a21  @22 

£*23_ 

^21 

^22 

^31 

by2 

anbn  +  £*12^21 

+ 

fl13^31 

anbl2  +  £*12^22  +  £*13^32 

a2lbll  +  £*22^21  "f  £*23^31  a2\b\2  T  #22^22  +  #23^32 


In  general,  if  A  has  dimension  r  X  c  and  B  has  dimension  c  X  5,  the  product 
AB  is  a  matrix  of  dimension  r  X  s  whose  element  in  the  zth  row  and  jth  column 
is: 

C 

X  (likbkj 
k=  1 


so  that: 

(6.12) 


AB 

rXs 


i=  1 


Thus,  in  the  foregoing  example,  the  element  in  the  first  row  and  second  column 
of  the  product  AB  is: 

3 

^alkb/c2  =  al\b\2  +  £*12^22  +  £*13^32 
k=  1 


as  indeed  we  found  by  taking  the  cross  products  of  the  elements  in  the  first  row  of 
A  and  second  column  of  B  and  summing. 


Additional  examples 
1. 


6.3  Matrix  multiplication  /  195 


2. 


~  4 

2 

CL\ 

4a  i  +  2  a2 

5 

8 

_d2_ 

5 ai  +  8 a2 

[2  3  5] 


2 

3 

5 


[22  +  32  +  52]  =  [38] 


Here,  the  product  is  a  1  X  1  matrix,  which  is  equivalent  to  a  scalar.  Thus,  the 
matrix  product  here  equals  the  number  38. 

3. 


"l  A 

~0o‘ 

0o  +  0iA 

1  x2 

_0i. 

= 

00  +  01^2 

1  X3 

00  +  01^3 

Regression  examples.  Let  us  define  the  vector  (5  of  the  regression  coeffi¬ 
cients  as  follows: 


(6.13) 


P  - 

2xl 


A) 

ft 


Then  the  product  Xp,  where  X  is  defined  in 


(6.6),  is  an  n  X  1  matrix: 


(6.14) 


xp 

nX  1 


'i  x; 

"0o" 

0o  3"  0i Ai 

1  X2 

A 

0o  +  0iA2 

.1 

_0o  +  0iX„_ 

Since  0O  +  01XI-  =  L(Y)),  we  see  that  Xp  is  the  vector  of  expected  values  E{Yt) 
for  the  simple  linear  regression  model,  i.e.,  E(Y)  =  Xp,  where  E(Y)  is  defined 
in  (6.9). 

Another  product  frequently  needed  is  Y' Y,  where  Y  is  the  vector  of  observa¬ 
tions  on  the  dependent  variable  as  defined  in  (6.4): 


Y'Y=  [Tj  Y2 

1X1 

(6.15) 


Y„] 


Y  i 
Y2 


= 


2  +  Y\  + 


+  yd  =  [srf] 


IA1 

Note  that  Y'Y  is  a  1  X  1  matrix,  or  a  scalar.  We  thus  have  a  compact  way  of 
writing  a  sum  of  squares:  Y'Y  =  ST2. 


196  /  Matrix  approach  to  simple  regression  analysis 


We  also  will  need  X'X,  which  is  a  2  X  2  matrix: 


(6.16) 


X'X 

2X2 


1  1  •••  1  " 

1 

X 

_ 1 

~n  2W 

1 

X 

1 _ 

1  x2 

1 

M 

>< 

M 

X 

1 _ 

r 

•  ■ 

i _ 

and  X'Y,  which  is  a  2  X  1  matrix: 

1  1 
Xi  X2 

(6.17) 


X'Y  = 

2X1 


"Ti1 

- 

y2 

Yn_ 

6.4  SPECIAL  TYPES  OF  MATRICES 

Certain  special  types  of  matrices  arise  regularly  in  regression  analysis.  We 
shall  consider  the  most  important  of  these. 


Symmetric  matrix 

If  A  =  A',  A  is  said  to  be  symmetric.  Thus,  A  below  is  symmetric: 


1  4  6 

1  4  6 

A  = 

4  2  5 

A'  = 

4  2  5 

6  5  3 

6  5  3 

Clearly,  a  symmetric  matrix  necessarily  is  square.  Symmetric  matrices  arise 
typically  in  regression  analysis  when  we  premultiply  a  matrix,  say,  X,  by  its 
transpose,  X'.  The  resulting  matrix,  X'X,  is  symmetric,  as  can  readily  be  seen 
from  (6.16). 


Diagonal  matrix 


A  diagonal  matrix  is  a  square  matrix  whose  off-diagonal  elements  are  all 
zeros,  such  as: 


A  = 


aj 

0 

0 


0 

a2 

0 


0 

0 

a3 


4  0  0  0 

0  1  0  0 

0  0  10  0 

0  0  0  5 


6.4  Special  types  of  matrices  /  197 


We  will  often  not  show  all  zeros  for  a  diagonal  matrix,  presenting  it  in  the  form: 


~4 

ai  0 

1  o 

a2 

B  = 

°  .3 

0  10 

5 

Two  important  types  of  diagonal  matrices  are  the  identity  matrix  and  the  scalar 
matrix. 

Identity  matrix.  The  identity  matrix  or  unit  matrix  is  denoted  by  I.  It  is  a 
diagonal  matrix  whose  elements  on  the  main  diagonal  are  all  l’s.  Premultiplying 
or  postmultiplying  any  r  X  r  matrix  A  by  the  r  X  r  identity  matrix  I  leaves  A 
unchanged.  For  example: 


"l  0 

o' 

0\\ 

0\2 

0\3 

Ow 

a\2 

a\3 

IA  = 

0  1 

0 

0-21 

0-22 

a23 

= 

o2\ 

a22 

023 

o 

o 

_ i 

1 

_a3\ 

a32 

0-33  _ 

a32 

a33_ 

Similarly,  we  have: 

an 

a12 

a\3 

"1 

0 

o' 

a\l 

a\2 

a\3 

AI  = 

a21 

a22 

0-23 

0 

1 

0 

= 

o2\ 

022 

a23 

G-31 

a32 

a33_ 

0 

0 

1 

__a3l 

a32 

a33_ 

Note  that  the  identity  matrix  I  therefore  corresponds  to  the  number  1  in  ordi¬ 
nary  algebra,  since  we  have  there  that  1  -x  =  x- 1  =  x. 

In  general,  we  have  for  any  r  X  r  matrix  A: 

(6.18)  AI  =  IA  =  A 

Thus,  the  identity  matrix  can  be  inserted  or  dropped  from  a  matrix  expression 
whenever  it  is  convenient  to  do  so. 


Scalar  matrix.  A  scalar  matrix  is  a  diagonal  matrix  whose  main-diagonal 
elements  are  the  same.  Two  examples  of  scalar  matrices  are: 


2  0 
0  2 


A 

0 

0 


0 

A 

0 


0 

0 

A 


A  scalar  matrix  can  be  expressed  AI,  where  A  is  the  scalar.  For  instance: 


~2 

o’ 

=  2 

"l 

o’ 

=  21 

0 

2 

0 

1 

'A 

0 

0“ 

"1 

0 

0" 

0 

A 

0 

=  A 

0 

1 

0 

_0 

0 

A 

_0 

0 

1_ 

=  AI 


198  /  Matrix  approach  to  simple  regression  analysis 


Multiplying  an  r  X  r  matrix  A  by  the  r  X  r  scalar  matrix  AI  is  equivalent  to 
multiplying  A  by  the  scalar  A. 


Vector  and  matrix  with  all  elements  1 

A  column  vector  with  all  elements  1  will  be  denoted  by  1: 


(6.19) 


1 

rX  1 


1 


and  a  square  matrix  with  all  elements  1  will  be  denoted  by  J: 


(6.20) 


J 

rXr 


For  instance,  we  have: 


V 

“1 

1 

f 

1  = 

1 

J  = 

1 

1 

1 

3X1 

1 

3X3 

_1 

1 

1 

Note  that  for  an  n  X  1  vector  1  we  obtain: 


l'l  =  [1 

1X1 


1] 


[n]  =  n 


and: 


11' 

nXn 


V 

[1  • 

•  1] 

"l  • 

•  r 

1 

1  • 

•  i 

=  J 

nXn 


Zero  vector 

A  zero  vector  is  a  column  vector  containing  only  zeros.  It  will  be  denoted 
by  0: 


6.5  Linear  dependence  and  rank  of  matrix  /  199 


(6.21) 

For  example,  we  have: 


6.5  LINEAR  DEPENDENCE  AND  RANK  OF  MATRIX 

Linear  dependence 

Consider  the  following  matrix: 

'12  5  1' 

A  =  2  2  10  6 

_3  4  15  1_ 

Let  us  think  now  of  the  columns  of  this  matrix  as  vectors.  Thus,  we  view  A  as 
being  made  up  of  four  column  vectors.  It  happens  here  that  the  columns  are 
interrelated  in  a  special  manner.  Note  that  the  third  column  vector  is  a  multiple  of 
the  first  column  vector: 


We  say  that  the  columns  of  A  are  linearly  dependent.  They  contain  redundant 
information,  so  to  speak,  since  one  column  can  be  obtained  as  a  linear  combina¬ 
tion  of  the  others. 

We  define  a  set  of  column  vectors  to  be  linearly  dependent  if  one  vector  can 
be  expressed  as  a  linear  combination  of  the  others.  If  no  vector  in  the  set  can  be 
so  expressed,  we  define  the  set  of  vectors  to  be  linearly  independent.  A  more 
general,  though  equivalent,  definition  for  the  c  column  vectors  Cl5 . . .  ,  Cc  in  an 
r  X  c  matrix  is: 

(6.22)  When  c  scalars  A1? . . . ,  Ac,  not  all  zero,  can  be  found  such  that: 

AjCj  +  A2C2  +  ■  •  •  +  AcCc  =  0 

where  0  denotes  the  zero  column  vector,  the  c  column  vectors  are 
linearly  dependent.  If  the  only  set  of  scalars  for  which  the  equality 
holds  is  Ai  =  0, . . . ,  Ac  =  0,  the  set  of  c  column  vectors  is  linearly 
independent. 


200  /  Matrix  approach  to  simple  regression  analysis 


To  illustrate  for  our  example,  Ax  =  5,  A2  =  0,  A3=  -1,  A4  =  0  leads  to: 


T 

[21 

"  5~ 

T 

'O' 

2 

+  0 

2 

-  1 

10 

+  0 

6 

= 

0 

_3_ 

_4_ 

_15_ 

_1_ 

0 

Hence,  the  column  vectors  are  linearly  dependent.  Note  that  some  of  the  A j  =  0 
here.  It  is  only  required  for  linear  dependence  that  not  all  A,  are  zero. 


Rank  of  a  matrix 

The  rank  of  a  matrix  is  defined  to  be  the  maximum  number  of  linearly  inde¬ 
pendent  columns  in  the  matrix.  We  know  that  the  rank  of  A  in  our  earlier  exam¬ 
ple  cannot  be  4,  since  the  four  columns  are  linearly  dependent.  We  can,  how¬ 
ever,  find  3  columns  (1,  2,  and  4)  which  are  linearly  independent.  There  are  no 
scalars  A1?  A2,  A4  such  that  AjCj  +  A2C2  +  A4C4  =  0  other  than  A!  =  A2  = 
A4  =  0.  Thus,  the  rank  of  A  in  our  example  is  3. 

The  rank  of  a  matrix  is  unique  and  can  equivalently  be  defined  as  the  maxi¬ 
mum  number  of  linearly  independent  rows.  It  follows  that  the  rank  of  an  r  X  c 
matrix  cannot  exceed  min(r,  c ),  the  minimum  of  the  two  values  r  and  c. 


6.6  INVERSE  OF  A  MATRIX 

In  ordinary  algebra,  the  inverse  of  a  number  is  its  reciprocal.  Thus,  the  in¬ 
verse  of  6  is  tj.  A  number  multiplied  by  its  inverse  always  equals  1: 

6-i=l 

1  ,  , 

X - =  x-x  =  X  ■  X  =  1 

X 

In  matrix  algebra,  the  inverse  of  a  matrix  A  is  another  matrix,  denoted  by 
A-1,  such  that: 

(6.23)  A_1A  =  AA-1  =  I 

where  I  is  the  identity  matrix.  Thus,  again,  the  identity  matrix  I  plays  the  same 
role  as  the  number  1  in  ordinary  algebra.  An  inverse  of  a  matrix  is  defined  only 
for  square  matrices.  Even  so,  many  square  matrices  do  not  have  an  inverse.  If  a 
square  matrix  does  have  an  inverse,  the  inverse  is  unique. 


Examples 

1.  The  inverse  of  the  matrix: 


2  4 

3  1 


6.6  Inverse  of  a  matrix  /  201 


is: 


since: 


A_1A 


- 1 

\ 

1 — ' 

_ 1 

"2  4" 

1 

O 

! 

1 

uo 

1 

io 

i _ 

3  1 

1 

O 

_ 1 

or: 

1  0 
0  1 

2.  The  inverse  of  the  matrix: 


AA 


-l 


2  4 

OJ 

1 _ 

.1  .4 

.3  -.2 


is: 


since: 


A 


3  0  0 
0  4  0 
0  0  2 


1 

3 

0 

0 

A-1  = 

0 

1 

4 

0 

0 

0 

1 

2 

-  1 

3 

0 

o’ 

’3 

0 

0’ 

"l 

0 

o’ 

II 

< 

T-M 

1 

< 

0 

1 

4 

0 

0 

4 

0 

= 

0 

1 

0 

_0 

0 

1 

2  _ 

0 

0 

2_ 

_0 

0 

1_ 

Note  that  the  inverse  of  a  diagonal  matrix  is  a  diagonal  matrix  consisting  simply 
of  the  reciprocals  of  the  elements  on  the  diagonal. 


Finding  the  inverse 

Up  to  this  point,  the  inverse  of  a  matrix  A  has  been  given,  and  we  have  only 
checked  to  make  sure  it  is  the  inverse  by  seeing  whether  or  not  A'1  A  =  I.  But 
how  does  one  find  the  inverse,  and  when  does  it  exist? 

An  inverse  of  a  square  r  X  r  matrix  exists  if  the  rank  of  the  matrix  is  r.  Such  a 
matrix  is  said  to  be  nonsingular.  An  r  X  r  matrix  with  rank  less  than  r  is  said  to 
be  singular,  and  does  not  have  an  inverse. 

Finding  the  inverse  of  a  matrix  can  often  require  a  tremendous  amount  of 
computing.  We  shall  take  the  approach  in  this  book  that  the  inverse  of  a  2  x  2 
matrix  and  a  3  X  3  matrix  can  be  calculated  by  hand.  For  any  larger  matrix,  one 
ordinarily  uses  a  computer  or  a  programmable  calculator  to  find  the  inverse, 
unless  the  matrix  is  of  a  special  form  such  as  a  diagonal  matrix.  It  can  be  shown 
that  the  inverses  for  2X2  and  3X3  matrices  are  as  follows: 


202  /  Matrix  approach  to  simple  regression  analysis 


1.  If: 

a  b 

A  = 

c  d 

then: 

d  —b 

D  D 

—c  a 

D  D 

where: 

D  =  ad  —  be 

D  is  called  the  determinant  of  the  matrix  A.  If  A  were  singular,  its  determinant 
would  equal  zero  and  no  inverse  of  A  would  exist. 

2.  If: 


(6.24) 


A-1  = 


a  b 
c  d 


-i-i 


then: 


B 


a  b  c 
d  e  f 
8  h  k 


a 

b 

e 

-l 

'A 

B 

C~ 

(6.25) 

B1  = 

d 

e 

f 

= 

D 

E 

F 

_g 

h 

k 

G 

H 

K 

where: 


A  =  (ek  —  fh)!Z 
D  =  -(dk  —  fg)!Z 
G  =  (dh-  eg)/Z 


B  =  —{bk  —  ch)/Z 
E  =  (ak  -  cg)!Z 
H  —  —(ah  —  bg)iZ 


C  —  (bf—  ce)/Z 
F=  ~(qf-  cd)/Z 
K  =  ( ae  —  bd)!Z 


and: 


Z  =  a(ek  —  fh )  —  b(dk  —  fg )  +  c(dh  —  eg) 

Z  is  called  the  determinant  of  the  matrix  B. 

Let  us  use  (6.24)  to  find  the  inverse  of: 


2 


3 


4 

1 


We  have: 


a  =  2  b  =  4 
c = 3  d= 1 

D  =  ad  -  be  =  2(1)  -  4(3)  =  - 10 


6.6  Inverse  of  a  matrix  /  203 


Hence: 


-.1  .4 

.3  -.2 


as  was  given  in  an  earlier  example. 

When  an  inverse  A-1  has  been  obtained,  either  by  hand  calculations  or  from  a 
computer  run,  it  is  usually  wise  to  compute  A-1  A  to  check  whether  the  product 
equals  the  identity  matrix,  allowing  for  minor  rounding  departures  from  0  and  1 . 


Regression  example 

The  principal  inverse  matrix  encountered  in  regression  analysis  is  the  inverse 
of  the  matrix  X'X  in  (6.16): 


XX 


Using  rule  (6.24),  we  have: 


n  IX, 

2x,  m 


a  =  n  b  =  IX i 
c  =  iXi  d  =  IX} 


so  that: 


D  =  nlX2  -  (SX.-XSXf)  =  n 


n!{Xi  ~  X)2 


Hence: 


IX2 

-IX  i 

(6.26) 

(X'X)-1  = 

nI(Xi  ~  X)2 
-2X/ 

n2(X,  -  X)2 

n 

_n2(X,-X)2 

nlQCi  -  X)2 

Since  22Q 

=  nX,  we  can  simplify  (6.26): 

IX} 

-X 

(6.27) 

(X'X)-1  = 

n2(X,  -  X)2 
-X 

2(X,  -  X)2 

1 

2(Xf  -  X)2 

Six  -  X)2  _ 

Uses  of  inverse  matrix 


In  ordinary  algebra,  we  solve  an  equation  of  the  type: 


204  /  Matrix  approach  to  simple  regression  analysis 


by  multiplying  both  sides  of  the  equation  by  the  inverse  of  5,  namely: 

4(5  y)  =  4(20) 

We  obtain: 

y  =  4(20)  =  4 

In  matrix  algebra,  if  we  have  an  equation: 

AY  =  C 

we  correspondingly  premultiply  both  sides  by  A-1,  assuming  A  has  an  inverse, 
and  obtain: 


A-1  AY  -  A-1C 

Since  A-1  AY  =  IY  =  Y,  we  obtain: 

Y  =  A-1C 

To  illustrate  this  use,  suppose  we  have  two  simultaneous  equations: 

2yi  +  4y2  =  20 
3yi  +  y2  =  10 

which  can  be  written  as  follows  in  matrix  notation: 


"2  4 

yi 

~20~ 

3  1 

yi_ 

10 

The  solution  of  these  equations  then  is: 


yi 

2 

4 

-1 

20 

yi_ 

3 

1 

10 

Earlier  we  found  the  required  inverse,  so  we  obtain: 


yi 

-.1  .4' 

~20' 

y 

J2_ 

.3  -.2 

10 

4 

Hence,  y1  =  2  and  y2  =  4  satisfy  these  two  equations. 


6.7  SOME  BASIC  THEOREMS  FOR  MATRICES 

We  list  here,  without  proof,  some  basic  theorems  for  matrices  which  we  will 
utilize  in  later  work. 

(6.28)  A  +  B  =  B  +  A 

(6.29)  (A  +  B)  +  C  =  A  +  (B  +  C) 

(6.30)  (AB)C  =  A(BC) 


6.8  Random  vectors  and  matrices  /  205 


(6.31) 

C(A  +  B) 

=  CA  +  CB 

(6.32) 

A(A  +  B) 

=  AA  +  AB 

(6.33) 

(A')' 

=  A 

(6.34) 

(A  +  B)' 

=  A'  +  B' 

(6.35) 

(AB)' 

=  B'A' 

(6.36) 

(ABC)' 

=  C'B'A' 

(6.37) 

(AB)'1 

=  B^A  1 

(6.38) 

(ABC)-1 

=  C-1B-1A 

(6.39) 

(A"1)"1 

=  A 

(6.40) 

(A')"1 

=  (A’1)' 

6.8  RANDOM  VECTORS  AND  MATRICES 

A  random  vector  or  a  random  matrix  contains  elements  which  are  random 
variables.  Thus,  the  observation  vector  Y  in  (6.4)  is  a  random  vector  since  the  F(- 
elements  are  random  variables. 


Expectation  of  random  vector  or  matrix 

Suppose  we  have  n  =  3  observations  and  are  concerned  with  the  observation 
vector: 


Y  = 


T, 

y2 

y3 


The  expected  value  of  Y  is  a  vector,  denoted  by  E(Y),  which  is  defined  as 
follows: 


E(Y)  = 


E(YX) 
E(Y2 ) 


L  E(Y3) 


Thus,  the  expected  value  of  a  random  vector  is  a  vector  whose  elements  are  the 
expected  values  of  the  random  variables  which  are  the  elements  of  the  random 
vector.  Similarly,  the  expectation  of  a  random  matrix  is  a  matrix  whose  elements 
are  the  expected  values  of  the  corresponding  random  variables  in  the  original 
matrix.  We  encountered  a  vector  of  expected  values  earlier  in  (6.9). 

In  general,  for  a  random  vector  Y  the  expectation  is: 

(6.41)  E(Y)  =  [E(Yi)]  i  =  l,...,n 

nX  1 


and  for  a  random  matrix  Y  with  dimension  n  X  p,  the  expectation  is: 
(6.42)  E(Y)  =  {EiY^l  i  =  1, . . . j  =  1, . . .  ,p 

nXp 


206  /  Matrix  approach  to  simple  regression  analysis 


Regression  example.  Suppose  the  number  of  observations  in  a  regression 
application  is  n  =  3.  The  three  error  terms  e} ,  e2 ,  £3  each  have  expectation  zero. 
For  the  error  vector: 


we  have: 


since: 


e 


£1 

e2 

£3 


E(e)  =  0 


E(e  1)  0 

E(e)  =  E{e2)  =  0=0 

_£(£3)J  Lo_ 


Variance-covariance  matrix  of  a  random  vector 

Consider  again  the  random  vector  Y  consisting  of  three  observations  Fl5  Y2, 
F3.  Each  random  variable  has  a  variance,  cr2(F;),  and  any  two  random  variables 
have  a  covariance,  cr(F;,  Yj).  We  can  assemble  these  in  a  matrix  called  the 
variance-covariance  matrix  of  Y,  denoted  by  ct2(Y): 

V(y,)  oix, ,  y2)  <t(yuy3) 

(6.43)  er2(Y)  =  <t(Y2,  Y,)  o-\Y2)  o -(Y2,  Y,) 

yffi.Y,)  <t(Y3,Y2)  a2(Y3) 

Note  that  the  variances  are  on  the  main  diagonal  and  the  covariance  cr(F;,  Y/) 
is  found  in  the  z'th  row  and  jth  column  of  the  matrix.  Thus,  a(Y2,  Fj)  is  found  in 
the  second  row,  first  column,  and  a(Yi,  Y2)  is  found  in  the  first  row,  second 
column.  Remember,  of  course,  that  a(Y2,  Fi)  =  cr(Y} ,  F2).  Since  in  general 
or{Yi,  Yf  =  cr(Yj,  Yt)  for  i  Y  j,  <t2(Y)  is  a  symmetric  matrix. 

It  follows  readily  that: 

(6.44)  ct2(Y)  =  E{[Y  -  E(Y)]  [Y  -  E(Y)]'} 

For  our  illustration,  we  have: 

"Fi  -  Wi)l  [Yi  ~  E(Y 0  F2  -  E(Y2)  F3  -  E(Y3)] 

a2(Y)  =  E  F2  —  E(Y2) 
y3  -  E(Y3)_ 

Multiplying  the  two  matrices  and  then  taking  expectations,  we  obtain: 


Location  in  Product  Term  Expected  Value 

Row  1,  column  1  [F  -  ZsfF)]2  o"2(F) 

Row  1,  column  2  [F  -  £(F)HF  -  E(Y2)]  cr(F,  F ) 

Row  1,  column  3  [F  —  £(F)][F  —  £(F)1  tr(F,  F) 

Row  2,  column  1  [F  -  £(F)HF  ~  £(F)1  cr(Y2,  F) 

etc.  etc.  etc. 


6.8  Random  vectors  and  matrices  /  207 


This,  of  course,  leads  to  the  variance-covariance  matrix  in  (6.43).  Remember  the 
definitions  of  variance  and  covariance  in  (1.14)  and  (1.19),  respectively,  when 
taking  expectations. 

To  generalize,  the  variance-covariance  matrix  for  an  n  X  1  random  vector  Y 
is: 


(6.45) 


<r2(Y) 

nXn 


o-2(Ti) 

o-(Yi,Y2)  ■■ 

o-(YltY„) 

°-(F2,  7i) 

<t2(Y2) 

1  •  o-(Y2,Y„) 

<r(Ya,  7,) 

<r(Yn,Y2)  •' 

■  •  o-2(FJ 

Note  again  that  tr2(Y)  is  a  symmetric  matrix. 


Regression  example.  Let  us  return  to  the  example  based  on  n  =  3  observa¬ 
tions.  Suppose  that  the  three  error  terms  have  constant  variance,  cr2(e;)  =  a2, 
and  are  uncorrelated  so  that  cr(Si,  ef)  =  0  for  i  ^  j.  We  can  then  write  the  vari¬ 
ance-covariance  matrix  for  the  random  vector  e  of  the  previous  example  as 
follows: 

a2(e)  =  cr2I 


since: 


"l 

0 

0 

"o-2 

0 

o" 

cr2I  =  a 2 

0 

1 

0 

= 

0 

or2 

0 

0 

0 

1 

0 

0 

o-2 

Note  that  all  variances  are  a2  and  all  covariances  are  zero. 


Some  basic  theorems 

Frequently,  we  shall  encounter  a  random  vector  W  which  is  obtained  by 
premultiplying  the  random  vector  Y  by  a  constant  matrix  A  (a  matrix  whose 
elements  are  fixed): 

(6.46)  W  =  AY 

Some  basic  theorems  for  this  case  are: 

(6.47)  E(A)  =  A 

(6.48)  E(W)  =  E(AY)  =  AE(Y) 

(6.49)  (r2(W)  =  ct2(AY)  =  A[ct2(Y)]A' 

where  o,2(Y)  is  the  variance-covariance  matrix  of  Y. 

Example.  As  a  simple  illustration  of  the  use  of  these  theorems,  consider: 


1  -f 

~Yi 

l 

1 

1 _ 

w2_ 

1  1 

Yi  +  t2J 

W  AY 

2X1  2X2  2X1 


208  /  Matrix  approach  to  simple  regression  analysis 


We  then  have  by  (6.48): 


E(W)  = 


"l 

-1 

£(Li)’ 

E (7i )  -  E(Y2) 

1 

1 

_E(X2)_ 

_E{Yi)  +  £(70_ 

and  by  (6.49): 

1  f 
-1  1 

0-2(70  +  °-2(70  -  2o-(71  ,  72)  0-2(70  -  °-2(L2) 

o-2(70  -  o-2(72)  o-2(70  +  o-2(72)  +  2o-(7l5  72) 


ct2(W)  = 


~1 

-1 

1 

1 

o-2(70 
o*(72,  70 


o-(7i ,  72) 
o-2(72) 


Thus: 


o-2(W0  =  0-2(7!  -  72)  =  0-2(7!)  +  0-2(70  -  2o-(7i ,  72) 
o-2(W2)  =  0-2(7!  +  72)  =  0-2(7!)  +  0-2(70  +  2o-(7i,  72) 

<r{Wu  W2)  =  o-(7]  -  Y2,  7i  +  72)  =  -  o-2(72) 

6.9  SIMPLE  LINEAR  REGRESSION  MODEL  IN  MATRIX  TERMS 

We  are  now  ready  to  develop  simple  linear  regression  in  matrix  terms.  Re¬ 
member  again  that  we  will  not  present  any  new  results,  but  shall  only  state  in 
matrix  terms  the  results  obtained  earlier.  We  shall  begin  with  the  regression 
model  (3.1): 

(6.50)  7;-  =  jS0  +  /3i X;  +  £i  i  =  1 , . . . ,  n 
This  implies: 

Y\  =  fio  +  jSiXi  +  £\ 

Y2  =  /3o  +  /3iX2  +  e2 

(6.51)  ; 

Y„  =  fio  +  pjX„  +  en 

We  defined  earlier  the  observation  vector  Y  in  (6.4),  the  X  matrix  in  (6.6),  the  e 
vector  in  (6.10),  and  the  (3  vector  in  (6.13).  Let  us  repeat  these  definitions: 


~Y\~ 

"l 

x," 

£\ 

y2 

X  = 

1 

X2 

p  = 

"Aj" 

_/3._ 

e  = 

e2 

Yn_ 

1 

_&n 

Now  we  can  write  (6.51)  in  matrix  terms  compactly  as  follows: 
(6.53)  Y  =  X  p  +  e 

nXl  nxl  2X1  nx  1 


6.9  Simple  linear  regression  model  in  matrix  terms  /  209 


since: 


~Yi 

"l 

xr 

= 

l 

X2 

r„ 

l 

Po 

£i 

fio  +  jSiX,  +  £\ 

Pi 

£2 

fio  +  PiX2  +  e2 

+ 

• 

£n 

@0  +  Pi  Xn  +  £n 

The  column  of  l’s  in  the  X  matrix  may  be  viewed  as  consisting  of  the  dummy 
variable  Xo  =  1  in  the  alternative  regression  model  (2.5): 

Yi  =  PoXq  +  faXi  +  £t  where  X0  =  1 

Thus,  the  X  matrix  may  be  considered  to  contain  a  column  vector  of  the  dummy 
variable  X0  and  another  column  vector  consisting  of  the  independent  variable 
observations  Xt. 

With  respect  to  the  error  terms,  model  (3.1)  assumes  that  E(sj )  =  0,  er2(£,) 
=  cr2,  and  that  the  et  are  independent  normal  random  variables.  The  condition 
E{£i)  =  0  in  matrix  terms  is: 


(6.54) 

since  (6.54)  states: 


E(e)  =  0 


el 

~E(£  i)" 

"0_ 

£2 

= 

E{£2) 

= 

0 

fn_ 

_E(£n)_ 

0 

The  condition  that  the  error  terms  have  constant  variance  a2  and  that  all 
covariances  cr(£,:,  ej)  for  i  X  j  are  zero  (since  the  £,  are  independent)  is  expressed 
in  matrix  terms  through  the  variance-covariance  matrix: 


(6.55)  o-2(e)  =  cr2I 

nXn  nXn 

since  (6.55)  states: 


"l 

0 

0 

...  o’ 

V 

0 

0 

o’ 

0 

1 

0 

...  o 

— 

0 

O-2 

0  ••• 

0 

0 

0 

0 

...  ! 

0 

0 

0  ••• 

o-2_ 

Thus,  the  normal  error  model  (3.1)  in  matrix  terms  is: 


(6.56) 


Y  =  Xp  +  e 


210  /  Matrix  approach  to  simple  regression  analysis 


where: 


e  is  a  vector  of  independent  normal  random  variables  with  E(e)  =  0 
and  cr2(e)  =  cr2I 


6.10  LEAST  SQUARES  ESTIMATION  OF 
REGRESSION  PARAMETERS 


Normal  equations 

The  normal  equations  (2.9): 

nb0  +  b{2Xi  =  Syf 

(6.57) 

b02X,  +  b.ZXf  = 

in  matrix  terms  are: 


(6.58)  X'Xb  =  X'Y 

where  b  is  the  vector  of  the  least  squares  regression  coefficients: 

V 


(6.58a) 


b  = 

2X1 


To  see  this,  recall  that  we  obtained  X'X  in  (6.16)  and  X'Y  in  (6.17).  Equation 
(6.58)  thus  states: 


n  HX, 

b0 

IXi  2Xf 

A 

or: 


nbo  +  bi^Xi 

’si7, 

boZXi  +  b^Xf 

These  are  precisely  the  normal  equations  in  (6.57). 


Estimated  regression  coefficients 

To  obtain  the  estimated  regression  coefficients  from  the  normal  equations: 

X'Xb  =  X'Y 

by  matrix  methods,  we  premultiply  both  sides  by  the  inverse  of  X'X  (we  assume 
this  exists): 

(X'X)_1X'Xb  =  (X'X)_1X'Y 
so  that  we  find,  since  (X'X)_1X'X  =  I  and  lb  =  b: 

(6.59)  b  =  (X'X)_1X'Y 

The  estimators  b0  and  b{  in  b  are  the  same  as  those  given  earlier  in  (2.10a)  and 
(2.10b).  We  shall  demonstrate  this  by  an  example. 


6.10  Least  squares  estimation  of  regression  parameters  /  211 


Example.  Let  us  find  the  estimated  regression  coefficients  for  the  Westwood 
Company  lot  size  example  by  matrix  methods.  From  earlier  work,  we  have 
(Table  2.2): 

n  =  10  ST,-  =1,100  2W-  =  500  2X?  =  28,400 

IXiYi  =  61,800 


Let  us  now  use  (6.26)  to  evaluate  (X'X)  k  We  have: 


nXiXt  -  Xf 

=  n 

U-  (2Xi)2l 

n 

=  10 

28,400 

(500)2 " 
10 

=  34,000 

Therefore: 

\ 

zxf 

-2Xf 

1 

"  28,400 

-500  " 

(X'X)-1  = 

nL(Xi  -  Xf 
-2Xf 

nliXi  -  Xf 

n 

= 

34,000 

-500 

34,000 

10 

_  nX{Xi  -  Xf 

nX{Xi  —  Xf  _ 

34,000 

34,000 

.83529412 

-.01470588_ 

-.01470588 

.00029412 

We  also  wish  to  make  use  of  (6.17)  to  evaluate  X'Y: 


X'Y 


Hence,  by  (6.59): 


b  = 


b0 

b\ 


(X'X)-1X'Y 


10.0 

2.0 


"  1,100~ 

ZX,Y,_ 

61,800 

.83529412  -.01470588" 

"  1,100" 

-.01470588  .00029412 

61,800 

or  bQ  =  10.0  and  b\  =  2.0.  This  agrees  with  the  results  in  Chapter  2.  Any  differ¬ 
ence  would  have  been  due  to  rounding  errors. 

To  reduce  the  effect  of  rounding  errors  when  obtaining  the  vector  b  by  hand 
calculations,  it  is  often  desirable  to  move  the  constant  in  the  denominator  of  the 
elements  of  (X'X)-1  outside  the  matrix,  and  do  the  division  as  the  last  step.  For 
our  example,  this  would  lead  to: 


(X'X)-1 


1 

nl(Xi  -  Xf 


2Xj 


-2X, 

n 


1 

34,000 


28,400 

-500 


-500 

10 


1 

28,400 

-500 

1,100 

34,000 

-500 

10 

61,800 

1 

"340,000" 

~10.0 

34,000 

68,000 

2.0 

212  /  Matrix  approach  to  simple  regression  analysis 


In  this  instance,  the  two  methods  of  calculation  lead  to  identical  results. 
Often,  however,  postponing  division  by  -  X)2  until  the  end  yields  more 
accurate  results. 


Comments 

1.  To  derive  the  normal  equations  by  the  method  of  least  squares,  we  minimize  the 
quantity: 

Q  =  ^  -  (ft  +  p  A)]2 


In  matrix  notation: 

(6.60)  Q  =  (Y  -  X|S)'(Y  -  XP) 

Expanding  out,  we  obtain: 

Q  =  Y'Y  -  P'X'Y  -  Y'Xp  +  p'X'Xp 

since  (XP)'  =  P'X'  by  (6.35).  Note  now  that  Y'Xp  is  1  x  1,  hence  is  equal  to  its 
transpose,  which  according  to  (6.36)  is  P'X'Y.  Thus,  we  find: 

(6.61)  Q  =  Y'Y  -  2p'X'Y  +  p'X'Xp 

To  find  the  value  of  P  which  minimizes  Q,  we  differentiate  with  respect  to  /30  and  A . 
Let: 


(6.62) 


Then  it  follows  that: 


dQ_ 

$Po 

dQ_ 

dPi 


(6.63)  —  02)  =  —  2X'Y  +  2X'Xp 

dp 

Equating  to  zero  and  substituting  b  for  p  gives  the  matrix  form  of  the  least  squares  normal 
equations: 


X'Xb  =  X'Y 

2.  A  comparison  of  the  normal  equations  and  X'X  shows  that  whenever  the  columns 
of  X'X  are  linearly  dependent,  the  normal  equations  will  be  linearly  dependent  also.  No 
unique  solutions  can  be  obtained  for  b0  and  b\  in  that  case.  Fortunately,  in  most  regression 
applications,  the  columns  of  X'X  are  linearly  independent,  leading  to  unique  solutions  for 
b0  and  bx. 

6.11  ANALYSIS  OF  VARIANCE  RESULTS 

Fitted  values  and  residuals 

Let  the  vector  of  the  fitted  values  Yt  be  denoted  by  Y: 


6.1 1  Analysis  of  variance  results 


/ 


213 


(6.64) 


/V 

Y 

nX  1 


Yx 

Y2 


Y„ 


and  the  vector  of  the  residuals  e,  =  F,  —  Y,  be  denoted  by  e: 


(6.65) 


e 

nX  1 


e\ 

e2 


e 


n 


In  matrix  notation,  we  then  have: 


(6.66) 

because: 


Similarly: 

(6.67) 


Y  =  X  b 

nXl  nX 2  2X1 


~Yx~ 

‘1  x," 

60 

b0  +  biXx 

y2 

— 

1  X2 

b\ 

— 

b0  +  bxX2 

Y n_ 

1 

bo  +  biXn 

e  =  Y  —  Y  =  Y  —  Xb 

nXl  nXl  «X1  nXl  «xl 


Sums  of  squares 

To  see  how  the  sums  of  squares  are  expressed  in  matrix  notation,  we  begin 
with  SSTO.  We  know  from  (3.49)  that: 

-  (ST)2 

(6.68)  SSTO  =  EF?  -  nY2  =  EF2  -  - — — 

n 


We  also  know  from  (6.15)  that: 

Y'Y  =  EF2 

The  subtraction  term  nY 2  =  (EF,)2/n  in  matrix  form  uses  1,  the  vector  of  l’s 
defined  in  (6.19),  as  follows: 


(2Ff)3 


Y'll'Y 


n 


(6.69) 


n 


214  /  Matrix  approach  to  simple  regression  analysis 


For  instance,  if  n 


Hence,  it  follows 


=  2,  we  have: 


Y 

[i  i] 

i 

that: 


•Wxsy,) 


(SY,-) 


2 


n 


(6.70a)  SSTO  =  Y'Y  -  ^-jY'll'Y 

Just  as  27?  is  represented  by  Y'Y  in  matrix  terms,  so  SSE  =  hef  = 
2(7,  —  7,-) 2  can  be  represented  as  follows: 


(6.70b)  SSE  =  e'e  =  (Y  -  Xb)'(Y  -  Xb) 

which  can  be  shown  to  equal: 

(6.70c)  SSE  =  Y'Y  -  b'X'Y 

Finally,  it  can  be  shown  that: 


(6.70d) 


SSR  =  b'X'Y 


1 

n 


Y'll'Y 


Example.  Let  us  find  SSE  for  the  Westwood  Company  lot  size  example 
by  matrix  methods,  using  (6.70c).  We  know  from  earlier  results: 

Y'Y  =  272  =  134,660 


We  also  know  from  earlier: 


10.0 


2.0 


Hence: 


b'X'Y  =  [10.0 


X'Y 


1,100 

61,800 


2.0] 


1,100 

61,800 


=  134,600 


and: 


SSE  =  Y'Y  -  b'X'Y  =  134,660  -  134,600  =  60 


which  is  the  same  result  as  that  obtained  in  Chapter  2.  Any  difference  would 
have  been  due  to  rounding  errors. 

Similarly,  we  can  find  SSR  using  (6.70d): 


SSR  =  b'X'Y  -  \  ~ JY'll'Y 

=  134,600  -  10(1 10)2  =  13,600 

since  the  subtraction  term  in  SSR  equals  nY2,  and  7  =  110  for  the  Westwood 
Company  example. 


6.11  Analysis  of  variance  results  /  215 


Note 

To  illustrate  the  derivation  of  the  sums  of  squares  expressions  in  matrix  notation, 
consider  SSE: 

SSE  =  e'e  =  (Y  -  Xb)'(Y  -  Xb)  =  Y'Y  -  2b'X'Y  +  b'X'Xb 
In  substituting  for  the  right-most  b  we  obtain  by  (6.59): 

SSE  =  Y'Y  -  2b'X'Y  +  b'X'X(X'X)_1X'Y 
=  Y'Y  -  2b'X'Y  +  b'lX'Y 

In  dropping  I  and  subtracting,  we  obtain  the  result  in  (6.70c): 

SSE  =  Y'Y  -  b'X'Y 


Sums  of  squares  as  quadratic  forms 

The  ANOVA  sums  of  squares  can  be  shown  to  be  quadratic  forms .  An  exam¬ 
ple  of  a  quadratic  form  of  the  observations  7,  when  n  —  2  is: 

(6.71)  5  Y\  +  67^2  +  4Yj 

Note  that  this  expression  is  a  second-degree  polynomial  containing  terms  involv¬ 
ing  the  squares  of  the  observations  and  the  cross  product.  We  can  express  (6.71) 
in  matrix  terms  as  follows: 


(6.71a) 


[7!  Y2]  5  3  7,  =  Y'AY 

3  4  72 


where  A  is  a  symmetric  matrix  of  coefficients. 
In  general,  a  quadratic  form  is  defined  as: 


n  n 

(6.72)  Y'AY  =  ^  ^ aijYjYj  where  a,j  =  a/7 

1 X 1  i=l j= l 

A  is  a  symmetric  n  X  n  matrix  and  is  called  the  matrix  of  the  quadratic  form. 

The  ANOVA  sums  of  squares  SSTO,  SSR,  and  SSE  are  all  quadratic  forms.  To 
see  this,  we  need  to  express  the  matrix  forms  for  these  sums  of  squares  in  (6.70) 
still  more  compactly.  We  do  this  by  noting  that: 

(6.73)  1  1'  =  J 

nX  1  lx  n  nXn 

where  J  is  the  n  X  n  matrix  all  of  whose  elements  are  l’s,  as  defined  in  (6.20). 
Also,  the  transpose  of  b  in  (6.59)  can  be  obtained  using  (6.36)  and  (6.33): 

(6.74)  b'  =  [(X'X)-1X'Y]'  =  Y'X(X'X)-1 

by  noting  that  (X'X)”1  is  a  symmetric  matrix  so  that  it  equals  its  transpose. 
Hence: 


216  /  Matrix  approach  to  simple  regression  analysis 


(6.75b) 


SSR  =  Y' 


X(X'X)-1X' 


J 


Y 


L  \nJ  J 

(6.75c)  SSE  =  Y'[I  -  X(X'X)-1X']Y 

Each  of  these  sums  of  squares  can  now  be  seen  to  be  of  the  form  Y'AY.  It  can 
be  shown  that  the  three  A  matrices: 


(6.76a) 


I  - 


n 


J 


(6.76b)  X(X'X)_1X'  -  |^-jj 

(6.76c)  I  -  X(X'X)_1X' 

are  symmetric.  Hence,  SSTO,  SSR,  and  SSE  are  quadratic  forms,  with  the  matri¬ 
ces  of  the  quadratic  forms  given  in  (6.76).  Quadratic  forms  play  an  important 
role  in  statistics  because  all  sums  of  squares  in  the  analysis  of  variance  for  linear 
statistical  models  can  be  expressed  as  quadratic  forms. 


6.12  INFERENCES  IN  REGRESSION  ANALYSIS 

As  we  saw  in  earlier  chapters,  all  interval  estimates  are  of  the  following  form: 
point  estimator  plus  and  minus  a  certain  number  of  estimated  standard  deviations 
of  the  point  estimator.  Similarly,  all  tests  require  the  point  estimator  and  the 
estimated  standard  deviation  of  the  point  estimator  or,  in  the  case  of  analysis  of 
variance  tests,  various  sums  of  squares.  Matrix  algebra  is  of  principal  help  in 
inference  making  when  obtaining  the  estimated  standard  deviations  and  sums  of 
squares.  We  have  already  given  the  matrix  equivalents  of  the  sums  of  squares  for 
the  analysis  of  variance.  Hence,  we  focus  here  chiefly  on  the  matrix  expressions 
for  the  estimated  standard  deviations  of  point  estimators  of  interest. 


Regression  coefficients 

The  variance-covariance  matrix  of  b: 

o-2(b0 ) 


(6.77) 


(b) 


cr(Z?i,  b0) 


o' (bo,  bi) 
a2(bi) 


is: 

(6.78)  or2(b)  =  cr2(X'X)_1 

2X2 


or,  using  (6.27): 


o-2ZXj  - Xcr 2 

nX(Xi  -  Xf  2(Xf  -  Xf 
-Xa2  a2 

2(X/  -  X)2  2(X,  -  X)2 


(6.78a) 


cr2(b)  = 


6.12  Inferences  in  regression  analysis  /  217 


When  MSE  is  substituted  for  a2  in  (6.78a)  we  have: 


(6.79) 


s2(b)  =  MSEiX'Xy1 

2X2 


MSE'EXf 
n'ZiX,  -  X)2 
-XMSE 
_  liXt  -  X)2 


—XMSE 
2(X,-  -  X)2 
MSE 

^(Xi-X)2_ 


where  s2(b)  is  the  estimated  variance-covariance  matrix  of  b.  In  (6.78a) ,  you  will 
recognize  the  variances  of  b0  (3.20b)  and  b\  (3.3b)  and  the  covariance  of  b0  and 
b\  (5.3).  Likewise,  the  estimated  variances  in  (6.79)  are  familiar  from  earlier 
chapters. 


Joint  confidence  region  for  /30  and  /?! 

The  boundary  for  the  joint  confidence  region  for  j30  and  /3j ,  given  in  (5.2),  is 
expressed  in  matrix  terms  as  follows: 


(6.80) 


(b  -  P)'X'X(b  -  P) 
2 MSE 


=  F(1  —  a;  2,  n  —  2) 


Mean  response 

To  estimate  the  mean  response  at  Xh ,  let  us  define  the  vector: 

(6.81)  X,  =  I  or  X/j  =  [1  X,] 

2xi 


The  fitted  value  in  matrix  notation  then  is: 
(6.82)  Yh  =  X£b 

1X1 


since: 

X/*b  =  [1  Xh ]  [b0~\  =  [b0  +  biXh]  =  [Yh]  =  Yh 

Note  that  Xfb  is  a  1  x  1  matrix;  hence,  we  can  write  the  final  result  as  a  scalar. 
The  variance  of  Yh,  given  earlier  in  (3.28b),  is  in  matrix  notation: 

(6.83)  a\Yh)  =  a2X'h{X'X)-lXh  =  X^or2(b)X, 

where  cr2(b)  is  the  variance-covariance  matrix  of  the  regression  coefficients  in 
(6.78).  Note,  therefore,  that  cr2{Yh )  is  a  function  of  the  variances  cr2(b0 )  and 
<r2(^1)  and  of  the  covariance  cr(bo,  bi). 

The  estimated  variance  of  Yh,  given  earlier  in  (3.30),  is  in  matrix  notation: 

(6.84)  s2(fh )  =  MSE(X£  (X'X)-%)  =  X^s2(b)X, 

where  s2(b)  is  the  estimated  variance-covariance  matrix  of  the  regression  coeffi¬ 
cients  in  (6.79). 


218  /  Matrix  approach  to  simple  regression  analysis 


Prediction  of  new  observation 

The  estimated  variance  s2(Yh(new)),  given  earlier  in  (3.37),  is  in  matrix  nota¬ 
tion: 

(6.85)  s2(r,(new))  =  MSE  +  s2(Yh)  =  MSE  +  X'hs2(b)Xh 

=  MSE(  1  +  X£(X'Xr%) 


Examples 


1.  We  wish  to  find  s2(b 0)  and  s2(b\)  for  the  Westwood  Company  lot  size 
example  by  matrix  methods.  We  found  earlier  that  MSE  =  7.5  and: 


(x'xr1 


.83529412  -.01470588 

-.01470588  .00029412 


Hence,  by  (6.79): 


s2(b)  =  MSE{X'X)~l  =  7.5 


.83529412 

-.01470588 


-.01470588 

.00029412 


6.264706  -.1102941 

-.1102941  .0022059 


Thus,  s2(b0)  =  6.26471  and  s2{b\)  =  .002206.  These  are  the  same  as  the  results 
obtained  in  Chapter  3. 

Note  how  simple  it  is  to  find  the  estimated  variances  of  the  regression  coeffi¬ 
cients  as  soon  as  (X'X)-1  has  been  obtained.  This  inverse  is  needed  in  the  first 
place  to  find  the  regression  coefficients,  so  that  practically  no  extra  work  is 
required  to  obtain  their  estimated  variances. 

2.  We  wish  to  find  s2(Yh)  for  the  Westwood  Company  example  when  Xh  = 
55.  We  define: 


Xrh  =  [1  55] 


and  obtain  by  (6.84): 


s\f55)  =  X£s2(b)X, 


[1  55] 

’  6.264706 

-  .1102941 

-  — 

1 

-.1102941 

.0022059 

55 

This  is  the  same  result  as  that  obtained  in  Chapter  3 ,  except  for  a  minor  differ¬ 
ence  due  to  rounding. 


Comments 

1.  To  illustrate  a  derivation  in  matrix  terms,  let  us  find  the  variance-covariance  matrix 
of  b.  Recall  that: 


b  =  (X'X)_1X'Y  =  AY 


6.13  Weighted  least  squares  /  219 


where  A  is  a  constant  matrix: 

A  =  (X'X)_1X' 

Hence  by  (6.49),  we  have: 

<r2(b)  =  A[<t2(Y)]A' 

Now  cr2(Y)  =  a2 1.  Further,  it  follows  from  (6.74)  that: 

A'  =  X(X'X)-1 

We  find  therefore: 

<r2(b)  =  (X,X)“1X,o-2IX(X'X)“1 
=  o-2(X'X)“1X'X(X,X)“1 
-  cr2(X'X)_1I 
=  <r2(X'X)-1 

2.  Since  Yh  —  X/,b,  it  follows  at  once  from  (6.49)  that: 

cr\Yh)  =  X/,[cr2(b)]X/J 


Hence: 


or: 

(6.86) 


cr\Yh)  =  [1 


[1  xh] 

or (b0,  b{) 

"  1  " 

a(bi ,  b0) 

o-2(bi) 

xh_ 

o-2(Yh)  =  o-2(b0)  +  2Xhor(b0,  bi)  +  Xl<r2(b{) 


Using  the  results  from  (6.78a),  we  obtain: 

f  o-2XX2  2 Xh(-X)cr2  X2ha2 

a  1  h)  nZ{Xi  -  X)2  2(Xf  -  Xf  2(X;  -  X)2 


which  reduces  to  the  familiar  expression: 

1  (XH~  X)2  " 
n  2C Xt  ~  X)2 

Thus,  we  see  explicitly  that  the  variance  expression  in  (6.87)  contains  contributions 
from  (r2(bo),  cr2{bi),  and  a(b0,  by),  which  it  must  according  to  theorem  (1.25b)  since  Yh 
is  a  linear  combination  of  b0  and  b\. 

Yh  ~  bo  +  b]Xh 

3.  We  do  not  show  the  results  in  matrix  terms  for  other  types  of  inferences,  such  as 
simultaneous  prediction  of  several  new  observations  on  Y  at  different  Xh  levels,  since 
these  are  based  on  results  we  have  developed. 


(6.87) 


a2{Yh)  =  cr 


6.13  WEIGHTED  LEAST  SQUARES 

The  regression  results  for  weighted  least  squares  can  be  stated  compactly  with 
matrix  algebra.  Let  the  matrix  W  be  a  diagonal  matrix  containing  the  weights  wh 


220  /  Matrix  approach  to  simple  regression  analysis 


(6.88) 


W\ 


W  = 

nXn 


W2 


0 


0 


The  weighted  least  squares  normal  equations  (5.32)  can  then  be  expressed  as 
follows: 


(6.89)  X'WXb  =  X'WY 
and  the  weighted  least  squares  estimators  are: 

(6.90)  b  =  (X'WX)-’X'WY 

2X  1 

Note  that  if  W  =  I,  as  it  would  for  unweighted  least  squares,  (6.90)  reduces 
to  the  unweighted  estimators  (6.59). 

Other  results  for  weighted  least  squares  bear  a  similar  relation  to  the  earlier 
results  for  unweighted  least  squares.  For  instance,  when  the  error  term  variances 
of  are  not  equal,  the  weights  w,  are  chosen  to  be  inversely  proportional  to  of,  so 
that  of  =  o2lwi .  The  variance-covariance  matrix  of  the  weighted  least  squares 
estimators,  then,  is: 

(6.91)  <r2(b)  =  <x2(X'WX)-1 

2X2 

and  the  estimated  variance-covariance  matrix  is: 


(6.92)  s2(b)  =  MSE^X'WXy1 

2X2 


where  MSEW  is  based  on  the  weighted  squared  deviations: 


(6.92a) 


MSEW  = 


2wt(Y,  -  Yd2 


n  —  2 


6.14  RESIDUALS 

For  later  analysis  of  residuals,  it  will  be  useful  to  recognize  that  each  residual 
6j  can  be  expressed  as  a  linear  combination  of  the  observations  Yr  It  can  be 
shown  that  the  vector  of  the  residuals  e,  defined  in  (6.65),  equals: 


(6.93) 

e  = 

(  I  -  H)  Y 

«X1 

nXn  nXn  nXl 

where: 

(6.93a) 

H  = 

)lX/i 

:  X(X'X)_1X' 

Note  from  (6.76c)  that  the  matrix  I  —  H  is  the  matrix  of  the  quadratic  form 
(6.75c)  for  SSE  = 


Problems  /  221 


The  square  n  x  n  matrix  H  is  called  the  hat  matrix  and  plays  an  important  role 
in  regression  analysis-,  as  we  shall  see  in  Chapter  1 1  when  we  consider  whether 
or  not  regression  results  are  unduly  influenced  by  one  or  a  few  observations.  The 
matrix  I  —  H  is  symmetric  and  has  the  special  property  (called  idempotency): 

(6.94)  (I  -  H)(I  -  H)  =  I  -  H 

In  general,  a  matrix  M  is  said  to  be  idempotent  if  MM  =  M. 

It  can  be  shown  that  the  variance-covariance  matrix  of  the  vector  of  residuals 
e  also  involves  the  matrix  I  —  H: 

(6.95)  <r2(e)  =  <r2(I  -  H) 
and  is  estimated  by: 

(6.96)  s2(e)  =  MSE{ I  -  H) 

Note 

The  variance-covariance  matrix  of  e  can  be  derived  by  means  of  (6.49).  Since 
e  =  (I  -  H)Y,  we  obtain: 

<r2(e)  =  (I  -  H)ct2(Y)(I  -  H)' 

Now  <t2(Y)  =  ct2(e)  =  cr2I  for  the  normal  error  model  according  to  (6.55).  Also, 
(I  -  H)  '  =  I  -  H  because  of  the  symmetry  of  the  matrix.  Hence: 

cr2(e)  =  o-2(I  -  H)I(I  -  H) 

=  <j2(I  -  H)(I  -  H) 

In  view  of  property  (6.94),  we  obtain  formula  (6.95): 

cr2(e)  =  cr2(I  —  H) 


PROBLEMS 


6.1.  For  the  matrices  below,  obtain:  (1)  A  +  B,  (2)  A  —  B,  (3)  AC,  (4)  AB',  (5)  B'A. 


"l 

4 

"l 

3 

2 

6 

B  = 

1 

4 

3 

8 

2 

5 

3 


5 


1 

0 


State  the  dimension  of  each  resulting  matrix. 


6.2.  For  the  matrices  below,  obtain:  (1)  A  +  C,  (2)  A  —  C,  (3)  B'A,  (4)  AC',  (5) 
C'A. 


"2 

f 

6" 

"3 

8~ 

3 

5 

B  = 

9 

C  = 

8 

6 

5 

7 

3 

5 

1 

4 

8 

1 

2 

4 

State  the  dimension  of  each  resulting  matrix. 


222  /  Matrix  approach  to  simple  regression  analysis 


6.3.  Show  how  the  following  expressions  are  written  in  terms  of  matrices:  (1)  Yt  —  Y-, 
=  et,  (2)  2X,e,  =  0.  Assume  i  =  1, ...  ,4. 

6.4.  Flavor  deterioration.  The  results  shown  below  were  obtained  in  a  small-scale 
experiment  to  study  the  relation  between  °F  of  storage  temperature  (X)  and  num¬ 
ber  of  weeks  before  flavor  deterioration  of  a  food  product  begins  to  occur  (Y). 

i:  1  2  3  4  5 

X;:  +8  +4  0  -4  -8 

Y-.  7.8  9.0  10.2  11.0  11.7 


Assume  that  the  first-order  regression  model  (3.1)  is  applicable.  Using  matrix 
methods,  find:  (1)  Y'Y,  (2)  X'X,  (3)  X'Y. 

6.5.  Consumer  finance.  The  data  below  show  for  a  consumer  finance  company 
operating  in  six  cities,  the  number  of  competing  loan  companies  operating  in  the 
city  (X)  and  the  number  per  thousand  of  the  company’s  loans  made  in  that  city  that 
are  currently  delinquent  (T): 

i:  1  2  3  4  5  6 


X(:  4  1  2  3  3  4 

F;:  16  5  10  15  13  22 


Assume  that  the  first-order  regression  model  (3.1)  is  applicable.  Using  matrix 
methods,  find:  (1)  Y'Y,  (2)  X'X,  (3)  X'Y. 

6.6.  Refer  to  Airfreight  breakage  Problem  2.17.  Using  matrix  methods,  find:  (1) 
Y'Y,  (2)  X'X,  (3)  X'Y. 


6.7.  Refer  to  Plastic  hardness  Problem  2.18.  Using  matrix  methods,  find:  (1)  Y'Y, 
(2)  X'X,  (3)  X'Y. 


6.8.  Let  B  be  defined  as  follows: 


1 


5  0 
0  5 
0  5 


a.  Are  the  column  vectors  of  B  linearly  dependent? 

b.  What  is  the  rank  of  B? 

c.  What  must  be  the  determinant  of  B? 


6.9. 


Let  A  be  defined  as  follows: 


A  = 


0 

0 

0 


1 

3 

5 


8 

1 

5 


a.  Are  the  column  vectors  of  A  linearly  dependent? 

b.  Restate  definition  (6.22)  in  terms  of  row  vectors.  Are  the  row  vectors  of  A 
linearly  dependent? 

c.  What  is  the  rank  of  A? 

d.  Calculate  the  determinant  of  A. 


6.10.  Find  the  inverse  of  each  of  the  following  matrices: 


A  = 

2  4 

3  1 

B  = 

4 

6 

3 

5 

2 

10 

r 

10 

1 

6 

Check  in  each  case  that  the  resulting  matrix  is  indeed  the  inverse. 


Problems  /  223 


6.11.  Find  the  inverse  of  the  following  matrix: 

5  1  3 

A  =  4  0  5 

1  9  6 

Check  that  the  resulting  matrix  is  indeed  the  inverse. 

6.12.  Refer  to  Flavor  deterioration  Problem  6.4.  Find  (X'X)-1. 

6.13.  Refer  to  Consumer  finance  Problem  6.5.  Find  (X'X)-1. 

6.14.  Consider  the  simultaneous  equations: 

4yi  +  ly2  =  25 
2yi  +  3^  =  12 

a.  Write  these  equations  in  matrix  notation. 

b.  Using  matrix  methods,  find  the  solutions  for  y\  and  y2. 

6.15.  Consider  the  simultaneous  equations: 

5yi  +  2j2  =  8 
23 yi  +  ly2  =  28 

a.  Write  these  equations  in  matrix  notation. 

b.  Using  matrix  methods,  find  the  solutions  for  yi  and  y2. 

6.16.  Consider  the  estimated  linear  regression  function  in  the  form  of  (2.15).  Write 
expressions  in  this  form  for  the  fitted  values  F,  in  matrix  terms  for  i  —  1 , . . . ,  5 . 

6.17.  Consider  the  following  functions  of  the  random  variables  Yx,  Y2,  and  F3: 

Wi  =  Yr  +  Y2  +  F3 

w2  =  y1-  y2 
w3  =  y1-y2-  y3 

a.  State  the  above  in  matrix  notation. 

b.  Find  the  expectation  of  the  random  vector  W. 

c.  Find  the  variance-covariance  matrix  of  W. 

6.18.  Consider  the  following  functions  of  the  random  variables  Yx ,  Y2,  Y3,  and  F4: 

Wx  =  ~{Yi  +  Y2  +  F3  +  F4) 

W2  =  j(Yi  +  F2)  -  y(F3  +  F4) 

a.  State  the  above  in  matrix  notation. 

b.  Find  the  expectation  of  the  random  vector  W. 

c.  Find  the  variance-covariance  matrix  of  W. 

6.19.  Find  the  matrix  A  of  the  quadratic  form: 

3Ff  +  lOF^  +  17Fi 

6.20.  Find  the  matrix  A  of  the  quadratic  form: 

7  Y\  -  8F1F2  +  8  Y\ 


224  /  Matrix  approach  to  simple  regression  analysis 


6.21.  For  the  matrix: 


A  = 


5 

2 


2 

1 


find  the  quadratic  form  of  the  observations  Y\  and  Y2- 
6.22.  For  the  matrix: 


A  = 


1  0 
0  3 
4  0 


4 

0 

9 


find  the  quadratic  form  of  the  observations  Y\ ,  Y2,  and  F3 . 

6.23.  Refer  to  Flavor  deterioration  Problems  6.4  and  6.12. 

a.  Using  matrix  methods,  obtain  the  following:  (1)  vector  of  estimated  regres¬ 
sion  coefficients,  (2)  vector  of  residuals,  (3)  SSR,  (4)  SSE,  (5)  estimated 
variance-covariance  matrix  of  b,  (6)  point  estimate  of  E(Yh)  when  Xh  =  —6, 
(7)  estimated  variance  of  Yh  when  Xh  —  —6. 

b.  What  simplifications  arose  from  the  spacing  of  the  X  levels  in  the  experi¬ 
ment? 

c.  Using  matrix  methods,  obtain  the  numerator  of  the  left  term  in  (6.80). 

6.24.  Refer  to  Consumer  finance  Problems  6.5  and  6.13. 

a.  Using  matrix  methods,  obtain  the  following:  (1)  vector  of  estimated  regres¬ 
sion  coefficients,  (2)  vector  of  residuals,  (3)  SSR,  (4)  SSE,  (5)  estimated 
variance-covariance  matrix  of  b,  (6)  point  estimate  of  E{Yh)  when  Xh  =  4,  (7) 
estimated  variance  of  Yh(jie w>  when  Xh  =  4. 

b.  From  your  estimated  variance-covariance  matrix  in  part  (a5),  obtain  the  fol¬ 
lowing:  (1)  s(b0,  bi)\  (2)  s2(fr0);  (3)  s(&i). 


6.25.  Refer  to  Airfreight  breakage  Problems  2.17  and  6.6. 

a.  Using  matrix  methods,  obtain  the  following:  (1)  (X'X)-1,  (2)  b,  (3)  e,  (4) 
SSR,  (5)  SSE,  (6)  s2(b),  (7)  Yh  when  Xh  =  2,  (8)  s2{Yh)  when  Xh  =  2. 

b.  From  part  (a6),  obtain  the  following:  (1)  s2{b\)',  (2)  s(b0,  b j);  (3)  s(b0). 

6.26.  Refer  to  Plastic  hardness  Problems  2.18  and  6.7. 

a.  Using  matrix  methods,  obtain  the  following:  (1)  (X'X)-1,  (2)  b,  (3)  Y,  (4) 
SSR,  (5)  SSE,  (6)  s2(b),  (7)  s2(Yh{ne w))  when  Xh  =  30. 

b.  From  part  (a6),  obtain  the  following:  (1)  s2(b0);  (2)  s(b0,  by)-,  (3)  s(bi). 


EXERCISES 

6.27.  Refer  to  regression  model  (5.13).  Set  up  the  expectation  vector  for  e.  Assume  that 
i  =  1, ... ,4. 

6.28.  Consider  model  (5.13)  for  regression  through  the  origin  and  the  estimator  bi  given 
in  (5.17).  Obtain  (5.17)  by  utilizing  (6.59)  with  X  suitably  defined. 

6.29.  Consider  the  least  squares  estimator  b  given  in  (6.59).  Using  matrix  methods, 
show  that  b  is  an  unbiased  estimator. 

6.30.  Show  that  Yh  in  (6.82)  can  be  expressed  in  matrix  terms  as  b'Xh. 


Exercises  /  225 


6.31.  Refer  to  regression  model  (5.36).  Set  up  the  variance-covariance  matrix  for  the 
error  terms  when  i  —  1, ...  ,4.  Assume  cr(eh  Sj )  =  0  for  i  #  j. 

6.32.  Derive  the  variance-covariance  matrix  cr2(b)  in  (6.91)  for  the  weighted  least 
squares  estimators  when  the  variance-covariance  matrix  of  the  observation  Yt  is 
cr2W - 1 ,  where  W  is  given  in  (6.88). 

6.33.  a.  Obtain  an  expression  for  Y  in  terms  of  the  H  matrix  defined  in  (6.93a).  [Hint: 

Use  (6.93).] 

b.  Obtain  an  expression  for  the  variance-covariance  matrix  of  the  fitted  values 
Yh  i  =  1, . . .  ,n,  in  terms  of  the  hat  matrix. 

CITED  REFERENCE 

6.1  Graybill,  Franklin  A.  Introduction  to  Matrices  with  Applications  in  Statistics.  Bel¬ 
mont,  Calif.:  Wadsworth,  1969. 


7 


Multiple  regression — I 


Multiple  regression  analysis  is  one  of  the  most  widely  used  of  all  statistical 
tools.  In  this  chapter,  we  first  discuss  a  variety  of  multiple  regression  models. 
Then  we  present  the  basic  statistical  results  for  multiple  regression  in  matrix 
form.  Since  the  matrix  expressions  for  multiple  regression  are  the  same  as  for 
simple  regression,  we  state  the  results  without  much  discussion.  We  then  give  an 
example,  illustrating  a  variety  of  inferences  in  multiple  regression  analysis.  Fi¬ 
nally,  we  take  up  some  additional  facets  of  multiple  regression  analysis. 

7.1  MULTIPLE  REGRESSION  MODELS 

Need  for  several  independent  variables 

When  we  first  introduced  regression  analysis  in  Chapter  2,  we  spoke  of  re¬ 
gression  models  containing  a  number  of  independent  variables.  We  mentioned  a 
regression  model  where  the  dependent  variable  was  direct  operating  cost  for  a 
branch  office  of  a  consumer  finance  chain,  and  four  independent  variables  were 
considered,  including  average  number  of  loans  outstanding  at  the  branch  and 
total  number  of  new  loan  applications  processed  by  the  branch.  We  also  men¬ 
tioned  a  tractor  purchase  study  where  the  response  variable  was  volume  of  tractor 
purchases  in  a  sales  territory,  and  the  nine  independent  variables  included  num¬ 
ber  of  farms  in  the  territory  and  quantity  of  crop  production  in  the  territory.  In 
addition,  we  mentioned  a  study  of  short  children  where  the  response  variable  was 


226 


7.1  Multiple  regression  models  /  227 


the  peak  plasma  growth  hormone  level,  and  the  14  independent  variables  in¬ 
cluded  sex,  age,  and  various  body  measurements.  In  all  these  examples,  one 
independent  variable  in  the  model  would  have  provided  an  inadequate  descrip¬ 
tion  since  a  number  of  key  independent  variables  affect  the  response  variable  in 
important  and  distinctive  ways.  Furthermore,  in  situations  of  this  type,  one  will 
frequently  find  that  predictions  of  the  response  variable  based  on  a  model  con¬ 
taining  only  a  single  independent  variable  are  too  imprecise  to  be  useful.  A  more 
complex  model,  containing  additional  independent  variables,  typically  is  more 
helpful  in  providing  sufficiently  precise  predictions  of  the  response  variable. 

In  each  of  the  examples  mentioned,  the  analysis  is  based  on  observational  data 
because  some  or  all  of  the  independent  variables  are  not  susceptible  to  direct 
control.  Multiple  regression  analysis  is  also  highly  useful  in  experimental  situa¬ 
tions  where  the  experimenter  can  control  the  independent  variables.  An  experi¬ 
menter  typically  will  wish  to  investigate  a  number  of  independent  variables  si¬ 
multaneously  because  almost  always  more  than  one  key  independent  variable 
influences  the  response.  For  example,  in  a  study  on  productivity  of  work  crews, 
the  experimenter  may  wish  to  control  both  the  size  of  the  crew  and  the  level  of 
bonus  pay.  Similarly,  in  a  study  on  responsiveness  to  a  drug,  the  experimenter 
may  wish  to  control  both  the  dose  of  the  drug  and  the  body  surface  area  of  the 
subject. 

First-order  model  with  two  independent  variables 

When  there  are  two  independent  variables  X\  and  X2,  the  model: 

(7.1)  Yi  =  ft,  +  PiXtl  +  kXa  +  £/ 

is  called  a  first-order  model  with  two  independent  variables.  A  first-order  model, 
it  will  be  recalled  from  Chapter  2,  is  linear  in  the  parameters  and  linear  in  the 
independent  variables.  Yt  denotes  as  usual  the  response  in  the  zth  trial,  and  Xn 
and  Xi2  are  the  values  of  the  two  independent  variables  in  the  z'th  trial.  The 
parameters  of  the  model  are  /30,  (3\,  and  /S2,  and  the  error  term  is  £,. 
Assuming  that  2s(e;)  =  0,  the  regression  function  for  model  (7.1)  is: 

(7.2)  E{Y)  =  /So  +  /SiAi  +  /S2X2 

Analogous  to  simple  linear  regression,  where  the  regression  function 
E(Y)  =  /S0  +  fiiX  is  a  line,  the  regression  function  (7.2)  is  a  plane.  Figure  7.1 
contains  a  representation  of  a  portion  of  the  response  plane: 

(7.3)  E(Y)  =  20.0  +  ,95Ai  -  ,50A2 

Note  that  a  point  on  the  response  plane  (7.3)  corresponds  to  the  mean  response 
E(Y)  at  the  given  combination  of  levels  of  Xi  and  X2. 

Figure  7.1  also  shows  a  series  of  observations  Yt  corresponding  to  given  levels 
of  the  two  independent  variables  (Xn,Xi2).  Note  that  each  vertical  rule  in  Fig¬ 
ure  7.1  represents  the  difference  between  Yt  and  the  mean  E{Yt)  of  the  probability 
distribution  for  {Xa,Xi2)  on  the  response  plane.  Hence,  the  vertical  distance 
from  Yi  to  the  response  plane  represents  the  error  term  £,■  =  Yt  —  E  (T) . 

Frequently  the  regression  function  in  multiple  regression  is  called  a  regression 


228  /  Multiple  regression — I 


FIGURE  7.1  Example  of  response  surface — a  response  plane  with  observations 
scattered  about  it 


surface  or  a  response  surface.  In  Figure  7. 1 ,  the  response  surface  is  just  a  simple 
plane,  but  in  other  cases  the  response  surface  may  be  complex  in  nature. 

Meaning  of  regression  coefficients.  Let  us  now  consider  the  meaning  of  the 
regression  parameters  in  the  multiple  regression  function  (7.2).  The  parameter  /30 
is  the  Y  intercept  of  the  regression  plane.  If  the  scope  of  the  model  includes 
X1  =  0,  X2  =  0,  (30  gives  the  mean  response  at  Xx  =  0,  X2  =  0.  Otherwise,  /30 
does  not  have  any  particular  meaning  as  a  separate  term  in  the  regression  model. 

The  parameter  j3j  indicates  the  change  in  the  mean  response  per  unit  increase 
in  X\  when  X2  is  held  constant.  Likewise,  j32  indicates  the  change  in  the  mean 
response  per  unit  increase  in  X2  when  X2  is  held  constant.  To  see  this  for  our 
example,  suppose  X2  is  held  at  the  level  X2  =  20.  The  regression  function  (7.3) 
now  is: 

(7.4)  E(Y)  =  20.0  +  ,95Zi  -  .50(20)  =  (20.0  -  10.0)  +  .95^ 

=  10.0  +  .95Xn 

Note  that  for  X2  =  20,  the  response  function  is  a  straight  line  with  slope  .95.  The 
same  is  true  for  any  other  value  of  X2,  only  the  intercept  of  the  response  function 
differs.  Hence,  fix  =  .95  indicates  that  the  mean  response  increases  by  .95  with  a 


7.1  Multiple  regression  models  /  229 


unit  increase  in  Xx  when  X2  is  constant,  no  matter  what  the  level  of  X2.  More 
loosely  speaking,  we  state  that  j3\  indicates  the  change  in  E(Y)  with  a  unit 
increase  in  Xi  when  X2  is  held  constant. 

Similarly,  /S2  =  —.50  in  model  (7.3)  indicates  that  the  mean  response  de¬ 
creases  by  .50  with  a  unit  increase  in  X2  when  Xx  is  held  constant. 

When  the  effect  of  Xl  on  the  mean  response  does  not  depend  on  the  level  of 
X2,  and  correspondingly  the  effect  of  X2  does  not  depend  on  the  level  of  Xl7  the 
two  independent  variables  are  said  to  have  additive  effects  or  not  to  interact. 
Thus,  the  first-order  model  (7.1)  is  designed  for  independent  variables  whose 
effects  on  the  mean  response  are  additive  or  do  not  interact. 

The  parameters  /3X  and  /32  are  frequently  called  partial  regression  coefficients 
because  they  reflect  the  partial  effect  of  one  independent  variable  when  the  other 
independent  variable  is  included  in  the  model  and  is  held  constant. 


Example.  Suppose  that  the  response  surface  in  (7.3)  pertains  to  urban  full- 
service  stations  of  a  major  oil  company  and  shows  the  effect  of  variety  and 
adequacy  of  services  (Xj)  and  average  time  taken  to  reach  car  (X2)  on  the  ratio  of 
actual  gallonage  of  gasoline  sold  to  potential  gallonage  (F),  where  X)  is  ex¬ 
pressed  as  an  index  with  100  =  average,  X2  is  in  seconds,  and  Y  is  stated  as  a 
percent.  Increasing  the  index  of  adequacy  of  services  by  one  point  while  holding 
average  time  to  reach  car  constant  leads  to  an  increase  of  .95  percent  point  in  the 
expected  ratio  of  actual  to  potential  gallonage.  If  the  index  of  adequacy  of  serv¬ 
ices  is  held  constant  and  the  average  time  to  reach  car  is  increased  by  one  second, 
the  expected  ratio  of  actual  to  potential  gallonage  decreases  by  .50  percent  point. 

Comments 


1 .  A  regression  model  for  which  the  response  surface  is  a  plane  can  be  used  either  in 
its  own  right  when  it  is  appropriate,  or  as  an  approximation  to  a  more  complex  response 
surface.  Many  complex  response  surfaces  can  be  approximated  well  by  a  plane  for  limited 
ranges  of  X,  and  X2. 

2.  We  can  readily  establish  the  meaning  of  f3x  and  (32  by  calculus,  taking  partial 
derivatives  of  the  response  surface  (7.2)  with  respect  to  Xx  and  X2  in  turn: 


dE(Y) 

dXx 


=  Pi 


dE(Y) 

dX2 


—  P2 


The  partial  derivatives  measure  the  rate  of  change  in  E(Y)  with  respect  to  one  independent 
variable  when  the  other  is  held  constant. 


First-order  model  with  more  than  two  independent  variables 

We  consider  now  the  case  where  there  are  p  —  1  independent  variables 
Xi, . . .  ,Xp-i.  The  model: 

(7.5)  Y(  =  (30  +  PiXn  +  (32Xi2  H - b  jBp-iXitP-i  +  et 

is  called  a  first-order  model  with  p  —  1  independent  variables.  It  can  also  be 
written: 


230  /  Multiple  regression — I 


p- 1 

(7 -5a)  Yj  =  A)  +  2  PkXik  +  £; 

fc=  i 

or,  if  we  let  Z/0  =  1,  it  can  be  written  as: 

(7.5b)  Yf  =  2  PkXik  +  £i  where  Xi0  =  1 

fc=0 

Assuming  that  £■(£,•)  =  0,  the  response  function  for  model  (7.5)  is: 

(7.6)  E(Y)  =  (30  +  ptfn  +  (32Xi2  +  •  •  •  +  p^X,^ 

This  response  function  is  a  hyperplane,  which  is  a  plane  in  more  than  two  dimen¬ 
sions.  It  is  no  longer  possible  to  picture  this  response  surface,  as  we  were  able  to 
do  in  Figure  7.1  for  the  case  of  two  independent  variables.  Nevertheless,  the 
meaning  of  the  parameters  is  analogous  to  the  two  independent  variables  case. 
The  parameter  (3k  indicates  the  change  in  the  mean  response  E(Y)  with  a  unit 
increase  in  the  independent  variable  Xk,  when  all  other  independent  variables  X1} 
X2,  etc.,  included  in  the  model  are  held  constant.  Note  again  that  the  effect  of 
any  independent  variable  on  the  mean  response  is  the  same  for  model  (7.5),  no 
matter  what  are  the  levels  at  which  the  other  independent  variables  are  held. 
Hence,  the  first-order  model  (7.5)  is  designed  for  independent  variables  whose 
effects  on  the  mean  response  are  additive  and  therefore  do  not  interact. 

Note 

If  p  —  1  =  1,  model  (7.5)  reduces  to: 

Yi  =  (30  +  faXn  +  £(• 

which  is  the  simple  linear  regression  model  considered  in  earlier  chapters. 

General  linear  regression  model 

In  general,  the  variables  Zl5 . . .  ,Xp^,  in  a  regression  model  do  not  have  to 
represent  different  independent  variables,  as  we  shall  shortly  see.  We  therefore 
define  the  general  linear  regression  model,  with  normal  error  terms,  simply  in 
terms  of  X  variables: 

(7.7)  Yi  =  fa  +  jSiXfi  +  (3 2Xi2  + - f  +  et 

where: 

j3o,  /3i, .  . . ,  /3P- 1  are  parameters 
Xn, . . .  ,Xip-x  are  known  constants 
Ej  are  independent  N( 0,  a2) 
i  —  l, ...  ,n 

If  we  let  Xi0  =  1,  model  (7.7)  can  be  written  as  follows: 

(7.7a)  Yi  =  p0Xi0  +  j3\XiX  +  (32Xl2  H - h  {3p-iXUp-i  +  £,■ 

where  Xi0  —  1 


7.1  Multiple  regression  models  /  231 


or: 

p- 1 

(7.7b)  Y,  =  2  faX*  +  e,  where  X,0  =  1 

k—0 

The  response  function  for  model  (7.7)  is,  since  £’(ei)  =  0: 

(7.8)  E(Y)  =  0O  +  0,X,  +  02X2  +  •  •  •  + 

Thus,  the  general  linear  regression  model  implies  that  the  observations  Yt  are 
independent  normal  variables,  with  mean  E(Yt)  as  given  by  (7.8)  and  with  con¬ 
stant  variance  cr2. 

This  general  linear  model  encompasses  a  vast  variety  of  situations .  We  shall 
consider  a  few  of  these  now: 

p  —  1  independent  variables.  When X2 , . . .  ,Xp-i  represent p  —  1  different 
independent  variables,  the  general  linear  model  (7.7)  is,  as  we  have  seen,  a 
first-order  model  in  which  there  are  no  interacting  effects  between  the  independ¬ 
ent  variables. 

Polynomial  regression.  Consider  the  curvilinear  regression  model  with  one 
independent  variable: 

(7.9)  Yi  =  0O  +  0xXt  +  02Xf  +  st 

If  we  let  Xu  =  Xt  and  Xi2  =  Xj,  we  can  write  (7.9)  as  follows: 

Yj  =  Pq+  0\Xn  +  02Xl2  + 

so  that  model  (7.9)  is  a  particular  case  of  the  general  linear  regression  model. 
While  (7.9)  illustrates  a  curvilinear  model  where  the  response  function  is  quad¬ 
ratic,  models  with  higher  degree  polynomial  response  functions  are  also  particu¬ 
lar  cases  of  the  general  linear  regression  model. 

Transformed  variables.  Consider  the  model: 

(7.10)  log  Yi  =  0O  +  0iXn  +  02Xi2  +  03Xi3  +  e, 

Here,  the  response  surface  is  a  highly  complex  one,  yet  model  (7.10)  can  be 
treated  as  a  general  linear  regression  model.  If  we  let  Y\  =  log  Yt,  we  can  write 
model  (7.10)  as  follows: 


y;  =  0o  +  0,xn  +  02xi2  +  03xi3  +  St 

which  is  in  the  form  of  the  general  linear  regression  model.  The  dependent 
variable  just  happens  to  be  measured  as  the  logarithm  of  Y. 

Many  models  can  be  transformed  into  general  linear  regression  models.  Thus, 
the  model: 


Yi  = 


1 


(7.11) 


0o  +  0\Xi{  +  02Xi2  +  Si 


232  /  Multiple  regression — I 


can  be  transformed  to  a  general  linear  regression  model  by  letting  Y\  =  1/F;.  We 
then  have: 


Yl  =(30  +  faXn  +  foXa  +  et 

Interaction  effects.  Consider  the  model  in  two  independent  variables  Xi  and 
Z2: 

(7.12)  Yt  =  fo  +  PiXn  +  (32Xi2  +  fcXnXn  +  st 

The  meaning  of  /3i  and  f$2  here  is  not  the  same  as  that  given  earlier  because  of  the 
cross-product  term  ^XnXi2.  It  can  be  shown  that  the  change  in  the  mean  re¬ 
sponse  with  a  unit  increase  in  X\  when  X2  is  held  constant  is: 

(7.13)  Pi  +  foX2 

Similarly,  the  change  in  the  mean  response  with  a  unit  change  in  X2  when  X\  is 
held  constant  is: 

(7.14)  fi2  +  P3X1 

Hence,  in  model  (7.12)  both  the  effect  of  Xi  for  given  level  of  X2  and  the  effect 
ofX2  for  given  level  of  X\  depend  on  the  level  of  the  other  independent  variable. 

In  Figure  7.2,  we  illustrate  the  effect  of  the  cross-product  term  in  model 
(7.12).  In  Figure  7.2a,  we  consider  a  response  function  without  a  cross-product 
term: 


E(Y)  =  10  +  2Xx  +  5X2 

and  show  there  the  response  function  E(Y)  when  X2  =  1  and  when  F2  =  3.  Note 
that  the  mean  response  increases  by  the  amount  j3\  =  2  with  a  unit  increase  of 
Xi,  whether  X2  =  1  or  X2  =  3. 

In  Figure  7.2b,  we  consider  the  same  response  function  but  with  the  cross- 
product  term  .5XiX2  added: 

E(Y)  =  10  +  2Xi  +  5X2  +  .5XxX2 

and  show  the  response  function  E(Y)  when  X2  =  1  and  when  X2  =  3.  Note  that 
the  slope  of  the  response  function  when  plotted  against  Xx  now  differs  for  X2=  l 
and  X2  =  3.  The  slope  of  the  response  function  when  X2  =  1  is  by  (7.13): 

jSi  +  £3X2  =  2  +  .5(1)  =  2.5 

and  when  X2  =  3,  the  slope  is: 

/Si  +  |S3X2  =  2  +  .5(3)  =  3.5 

Hence,  /3i  in  model  (7.12)  containing  a  cross-product  term  no  longer  indicates 
the  change  in  the  mean  response  for  a  unit  increase  in  Xx  for  any  given  X2  level. 
That  effect  in  this  model  depends  on  the  level  of  X2.  Model  (7.12)  with  the 
cross-product  term  is  therefore  designed  for  independent  variables  whose  effects 
on  the  dependent  variable  interact.  The  cross-product  term  (33XnXi2  is  called  an 
interaction  term.  While  the  mean  response  in  model  (7.12)  when X2  is  constant  is 


7.1  Multiple  regression  models  /  233 


FIGURE  7.2  Effect  of  cross-product  term 
variables 

(a) 

E(Y)=  10  4-  2Xy  +  5X2 
Slope  is  j3j  =2  for  all  X2 


in  response  function  with  two  independent 


(b) 

E(Y )  =  10  +  2Xx  +  5X,  +  .5X1X1 
Slope  is  2.5  when  X2  =  1  but  is  3.5  when  X2  =  3 


still  a  linear  function  of  X\,  now  both  the  intercept  and  the  slope  of  the  response 
function  change  as  the  level  at  which  X2  is  held  constant  is  varied.  The  same 
holds  when  the  mean  response  is  regarded  as  a  function  of  X2,  with  X)  constant. 

Despite  these  complexities  of  model  (7.12),  it  can  still  be  regarded  as  a  gen¬ 
eral  linear  regression  model.  Let  X,-3  =  X,\Xi2.  We  can  then  write  (7.12)  as  fol¬ 
lows: 


Yi  =  /3o  +  ftXil  +  foXa  +  PsXa  +  e, 


which  is  in  the  form  of  the  general  linear  regression  model. 

Note 

To  derive  (7.13)  and  (7.14),  we  differentiate: 

E(Y)  = /30  +  frX,  +  (32X2  +  ftXjXa 
with  respect  to  Xx  and  X2,  respectively: 


dE(Y) 

dXx 


-  Pi  +  /33X2 


dE(Y) 

dX2 


—  Pi  +  fi 3X1 


Combination  of  cases.  A  regression  model  may  combine  a  number  of  the 
elements  we  have  just  noted  and  still  can  be  treated  as  a  general  linear  regression 
model.  Consider  a  model  with  two  independent  variables,  each  in  quadratic 
form,  with  an  interaction  term: 

(7.15)  Yi  =  ft,  +  P1V1  +  ftXfi  +  183X2  +  &X?2  +  &X1X2  +  <7 
Let  us  define: 

Z/i  =  Xi  Za=Xl  Zi3=X2  Z,t  =  Xf2  Zi5=XtlXa 


234  /  Multiple  regression — I 


We  can  then  write  model  (7.15)  as  follows: 


Yi  ~  fio  +  iSiZn  +  p2Zi2  +  p3Zi3  +  (34Zl4  +  fi5Zi5  +  e, 


which  is  in  the  form  of  the  general  linear  regression  model. 

Comments 

1.  It  should  be  clear  from  the  various  examples  that  the  general  linear  regression 
model  (7.7)  is  not  restricted  to  linear  response  surfaces.  The  term  linear  model  refers  to 
the  fact  that  (7.7)  is  linear  in  the  parameters,  not  to  the  shape  of  the  response  surface. 

2.  Figure  7.3  illustrates  some  complex  response  surfaces  that  may  be  encountered 
when  there  are  two  independent  variables. 

FIGURE  7.3  Additional  examples  of  response  functions 

(a)  (b) 


Interactions  and  nature  of  response  surface 

We  introduced  the  concept  of  interacting  independent  variables  earlier,  and 
now  shall  illustrate  further  how  the  response  surface  differs  when  two  independ¬ 
ent  variables  do  not  interact  and  when  they  do  interact. 

Figure  7.4a  contains  a  representation  of  a  response  surface  in  which  the  two 
independent  variables  (mean  season  temperature,  amount  of  rainfall)  do  not  in¬ 
teract  on  the  dependent  variable  (com  yield).  The  absence  of  interactions  can  be 
seen  by  considering  the  corn  yield  curves  for  given  mean  season  temperatures  as 
a  function  of  rainfall.  These  curves  all  have  the  same  shape  and  differ  only  by  a 
constant.  Thus,  each  ordinate  of  the  com  yield  curve  when  the  mean  temperature 
is  70°  is  a  constant  number  of  units  higher  than  the  corresponding  ordinate  for  the 
com  yield  curve  when  the  mean  temperature  is  78°. 

Equivalently,  one  can  note  the  absence  of  interactions  by  considering  the  corn 
yield  curves  for  given  amounts  of  rainfall  as  a  function  of  temperature.  Again, 
these  curves  are  the  same  in  shape  and  differ  only  by  a  constant. 


7.1  Multiple  regression  models  /  235 


FIGURE  7.4  Response  surfaces  for  additive  and  interacting  independent 
variables 

(a)  Independent  Variables  Do  Not  Interact 
Yield  of  corn  as  function  of  season  rainfall  and  mean  temperature 


'8  'I 


Absence  of  interactions  therefore  implies  that  the  mean  response  E(Y)  can  be 
expressed  in  the  form: 


(7.16) 


E(Y)  =/1(X1)  +f2(X2) 


where /i  and  f2  can  be  any  functions,  not  necessarily  simple  ones. 

Figure  7.4b  illustrates  a  case  where  the  two  independent  variables  (age,  per¬ 
cent  of  normal  weight)  interact  on  the  dependent  variable  (mortality  ratio).  Here, 
the  shape  of  the  mortality  ratio  curve  as  a  function  of  percent  of  normal  weight 
varies  for  different  ages.  For  men  22  years  old,  both  underweight  and  overweight 
persons  have  higher  mortality  rates  than  normal  (normal  =  100)  for  that  age.  On 
the  other  hand,  for  men  52  years  old,  the  mortality  rate  is  above  normal  for  that 
age  for  overweight  persons  but  not  for  underweight  persons.  Similarly,  the  mor¬ 
tality  ratio  curves  as  a  function  of  age  vary  in  shape  for  different  weights. 

We  can  illustrate  the  difference  in  the  shape  of  the  response  function  when  the 
two  independent  variables  do  and  do  not  interact  in  yet  another  way,  namely,  by 
representing  the  response  surface  by  means  of  a  contour  diagram.  Such  a  dia¬ 
gram  shows,  for  a  number  of  different  response  levels,  the  various  combinations 
of  the  two  independent  variables  which  yield  the  same  level  of  response. 
Figure  7.5a  shows  a  contour  diagram  for  the  response  surface  portrayed  in  Fig¬ 
ure  7.1: 


236  /  Multiple  regression — I 


FIGURE  7.4  (concluded) 


(b)  Independent  Variables  Interact 

Mortality  ratio  for  men  as  function  of  age  and  percent  of  normal  weight 


Source:  Reprinted,  with  permission,  from  M.  Ezekiel  and  K.  A.  Fox,  Methods 
of  Correlation  and  Regression  Analysis,  3d  ed.  (New  York;  John  Wiley  &  Sons, 
1959),  pp.  349-50. 


E(Y)  =  20.0  +  .95X!  -  .50X2 

Note  that  the  independent  variables  do  not  interact  in  this  response  function  and 
that  the  contour  lines  are  parallel.  Figure  7.5b  shows  a  contour  diagram  for  the 
response  function: 

E(Y)  =  5Xx  +  1X2  +  3XjX2  . 

where  the  two  independent  variables  interact  and  the  contour  curves  are  not 
parallel. 

In  general,  additive  or  noninteracting  independent  variables  lead  to  parallel 
contour  curves  while  interacting  independent  variables  lead  to  nonparallel  con¬ 
tour  curves. 


7.2  General  linear  regression  model  in  matrix  terms  /  237 


FIGURE  7.5  Response  contour  diagrams 

(a)  E(Y)  =  20.0  +  .95*!  -  .50*, 
Noninteracting  Independent  Variables 


(b)£(y)  =  5*j  +1X2+3XxX2 
Interacting  Independent  Variables 


7.2  GENERAL  LINEAR  REGRESSION  MODEL 
IN  MATRIX  TERMS 

We  shall  now  present  the  principal  results  for  the  general  linear  regression 
model  (7.7)  in  matrix  terms.  This  model,  as  we  have  noted,  encompasses  a  wide 
variety  of  particular  cases.  The  results  to  be  presented  are  applicable  to  all  of 
these. 

It  is  a  remarkable  property  of  matrix  algebra  that  the  results  for  the  general 
linear  regression  model  (7.7)  appear  exactly  the  same  in  matrix  notation  as  those 
for  the  simple  linear  regression  model  (6.56).  Only  the  degrees  of  freedom  and 
other  constants  related  to  the  number  of  independent  variables  and  the  dimen¬ 
sions  of  some  matrices  will  be  different.  Hence,  we  shall  be  able  to  present  the 
results  very  concisely. 

The  matrix  notation,  to  be  sure,  may  hide  enormous  computational  complexi¬ 
ties.  The  inverse  of  a  10  X  10  matrix  A  requires  tremendous  amounts  of  compu¬ 
tation,  yet  is  simply  represented  as  A-1.  Our  reason  for  emphasizing  matrix 
algebra  is  that  it  indicates  the  essential  conceptual  steps  in  the  solution.  The 
actual  computations  will  in  all  but  the  very  simplest  cases  be  done  by  program¬ 
mable  calculator  or  computer.  Hence,  it  does  not  matter  for  us  whether  (X'X)-1 
represents  finding  the  inverse  of  a  2  X  2  or  a  10x  10  matrix.  The  important 
point  is  to  know  what  the  inverse  of  the  matrix  represents. 

To  express  the  general  linear  regression  model  (7.7): 

Yi  =  (Bo  +  PiXn  +  (S2Xi2  +  ■■■+  Pp-iXi'p-i  +  e, 

in  matrix  terms,  we  need  to  define  the  following  matrices: 


238  /  Multiple  regression — I 


(7.17a) 


(7.17b) 


Vf 

T 

Xn 

Xi2  • 

>< 

'b 

1 

_ I 

y2 

1 

X2i 

X22  • 

■  X2,p—  i 

Y  = 

X  = 

■ 

• 

• 

• 

nX  1 

Yn_ 

nXp 

1 

Xm 

xn2  • 

Xn,p—  i 

(7.17) 


(7.17c) 


(7.17d) 


/^o 

ref 

P\ 

e 2 

p  = 

• 

e  = 

pxl 

nX  1 

j3p- 1_ 

En 

Note  that  the  Y  and  e  vectors  are  the  same  as  for  simple  regression.  The  p  vector 
contains  additional  regression  parameters,  and  the  X  matrix  contains  a  column  of 
l’s  as  well  as  a  column  of  the  n  values  for  each  of  the  p  —  IX  variables  in  the 
regression  model.  The  row  subscript  for  each  element  Xik  in  the  X  matrix  identi¬ 
fies  the  trial,  and  the  column  subscript  identifies  the  X  variable. 

In  matrix  terms,  the  general  linear  regression  model  (7.7)  is: 


(7.18)  Y  =  X  p  +  e 

nX  1  nXp  px  1  rexl 


where: 

Y  is  a  vector  of  observations 
P  is  a  vector  of  parameters 
X  is  a  matrix  of  constants 

e  is  a  vector  of  independent  normal  random  variables  with  expectation 
E(e)  =  0  and  variance-covariance  matrix  cr2(e)  =  cr2I 


Consequently,  the  random  vector  Y  has  expectation: 
(7.18a)  E(Y)  =  Xp 


and  the  variance-covariance  matrix  of  Y  is: 


(7.18b) 


ct2(Y)  =  o-2I 


7.3  LEAST  SQUARES  ESTIMATORS 

Let  us  denote  the  vector  of  estimated  regression  coefficients  b0,  bi, ... , bp-\ 
as  b: 


7.4  Analysis  of  variance  results  /  239 


(7.19)  b  = 

pxl 


The  least  squares  normal  equations  for  the  general  linear  regression  model  (7.18) 
are: 

(7.20)  (X'X)  b  =  X'  Y 

pXp  pX  1  pXn  nx  1 

and  the  least  squares  estimators  are: 

(7.21)  b  =  (X'X)-1X'Y 

pX  1  pXp  pX  1 

For  model  (7.18),  these  least  squares  estimators  are  also  maximum  likelihood 
estimators  and  have  all  the  properties  mentioned  in  Chapter  2:  they  are  unbiased, 
minimum  variance  unbiased,  consistent,  and  sufficient. 


bo 

bx 

b2 


bp- 1 


7 .4  ANALYSIS  OF  VARIANCE  RESULTS 


A 

Yi 

Cl 

A 

^2 

e2 

A 

Y  = 

■ 

(7.22b)  e  = 

• 

«Xl 

nX  1 

Yn_ 

£n 

/\  A 

Let  the  vector  of  the  fitted  values  Y,  be  denoted  by  Y  and  the  vector  of  the 
residual  terms  et  =  7Z-  —  7,  be  denoted  by  e: 


(7.22)  (7.22a) 


The  fitted  values  are  represented  by: 

(7.23)  Y  =  Xb 
and  the  residual  terms  by: 

(7.24)  e  =  Y  —  Y  =  Y  —  Xb 

Sums  of  squares  and  mean  squares 

The  sums  of  squares  for  the  analysis  of  variance  are: 


(7.25) 

SSTO  = 

(7.26) 

SSR  = 

(7.27) 

SSE  = 

1 


n 


Y'll'Y 


SSR  =  b'X'Y  -  |  —  lY'll'Y 
n 

SSE  =  e'e  =  (Y  -  Xb)'(Y  -  Xb)  =  Y'Y  -  b'X'Y 


240  /  Multiple  regression — I 


where  1  is  an  n  x  1  vector  of  l’s  as  defined  in  (6.19). 

SSTO,  as  usual,  has  n  —  1  degrees  of  freedom  associated  with  it.  SSE  has 
n  —  p  degrees  of  freedom  associated  with  it  since  p  parameters  need  to  be  esti¬ 
mated  in  the  regression  function  for  model  (7.18).  Finally,  SSR  has  p  —  1  de¬ 
grees  of  freedom  associated  with  it,  representing  the  number  of  X  variables 
Xu...tXp-x. 

Table  7. 1  shows  these  analysis  of  variance  results,  as  well  as  the  mean  squares 
MSR  and  MSE: 


(7.28) 


MSR  = 


P~  1 


(7.29) 


MSE  = 


n  —  p 


The  expectation  of  MSE  is  cr2,  as  for  simple  regression.  The  expectation  of 
MSR  is  cr2  plus  a  quantity  that  is  nonnegative.  For  instance,  when  p  —  1  =  2,  we 
have: 

E(MSR)  =  cr2  +  mi(Xn  -  XO2  +  0&(XI2  -  X2)2 

+  2/3,/322(X„  -  X,)(Xa  -  X2)]/2 

Note  that  if  both  and  equal  zero ,E(MSR)  =  cr2.  Otherwise  E(MSR )  >  cr2. 


TABLE  7.1  ANOVA  table  for  general  linear  regression  model  (7.18) 

Source  of 

Variation  SS  df  MS 

Regression  SSR  =  b'X'Y  -  I  —  Y'll'Y  p  -  1  MSR  =  SS*- 

\n  p~  1 

v  SSE 

Error  SSE  =  Y'Y  -  b'X'Y  n-p  MSE  = - 

n  —  p 


SSTO  =  Y'Y  —  —  Y'll'Y  n-  1 


F  test  for  regression  relation 

To  test  whether  there  is  a  regression  relation  between  the  dependent  variable  Y 
and  the  set  of  X  variables  X1 , . . .  ,Xp-lt  i.e.,  to  choose  between  the  alternatives: 


(7.30a) 


H0  :  ft  =  ft  =••■  =  &_!  =  0 

Ha :  not  all  (3k  (k  =  1 1)  equal  0 


we  use  the  test  statistic: 


F*  = 


MSR 

MSE 


(7.30b) 


7.4  Analysis  of  variance  results  /  241 


The  decision  rule  to  control  the  Type  I  error  at  a  is: 


(7.30c) 


If  F*  ^  F(  1  —  a;  p  —  1,  n  —  p),  conclude  H0 
If  F*  >  F(  1  —  a;  p  —  1,  n  —  p),  conclude  Ha 


The  existence  of  a  regression  relation  by  itself  does  not  of  course  assure  that 
useful  predictions  can  be  made  by  using  it. 

Note  that  when  p  —  1  =  1,  this  test  reduces  to  the  F  test  in  (3.61)  for  testing 
in  simple  linear  regression  whether  or  not  (3X  =  0. 


Coefficient  of  multiple  determination 

The  coefficient  of  multiple  determination,  denoted  by  R2,  is  defined  as  fol¬ 
lows: 


(7.31) 


SSR  SSE 

- =  i - 

SSTO  SSTO 


It  measures  the  proportionate  reduction  of  total  variation  in  Y  associated  with  the 
use  of  the  set  of  X  variables  X1, . . .  ,XP-X.  The  coefficient  of  multiple  determina¬ 
tion  R2  reduces  to  the  coefficient  of  determination  r 2  in  (3.69)  for  simple  linear 
regression  when  p  —  1  =  1,  i.e.,  when  one  independent  variable  is  in  model 
(7.18).  Just  as  for  r2,  we  have: 

(7.32)  0  <  F2  <1 

R2  assumes  the  value  0  when  all  bk  =  0  (k  =  l, ...  ,p  —  1).  R2  takes  on  the  value 
1  when  all  observations  fall  directly  on  the  fitted  response  surface,  i.e.,  when 
Yi  =  Yj  for  all  i. 

Comments 


1 .  To  distinguish  between  the  coefficients  of  determination  for  simple  and  multiple 
regression,  we  shall  from  now  on  call  r2  the  coefficient  of  simple  determination. 

2.  It  can  be  shown  that  the  coefficient  of  multiple  determination  R2  can  be  viewed  as 
a  coefficient  of  simple  determination  r2  between  the  responses  YL  and  the  fitted  values  Y,. 

3.  A  large  R2  does  not  necessarily  imply  that  the  fitted  model  is  a  useful  one.  For 
instance,  observations  may  have  been  taken  at  only  a  few  levels  of  the  independent 
variables.  Despite  a  high  R2  in  this  case,  the  fitted  model  may  not  be  useful  because  most 
predictions  would  require  extrapolations  outside  the  region  of  observations.  Again,  even 
though  R2  is  large,  MSE  may  still  be  too  large  for  inferences  to  be  useful  in  a  case  where 
high  precision  is  required. 

4.  Adding  more  independent  variables  to  the  model  can  only  increase  R2  and  never 
reduce  it,  because  SSE  can  never  become  larger  with  more  independent  variables  and 
SSTO  is  always  the  same  for  a  given  set  of  responses.  Since  R2  often  can  be  made  large  by 
including  a  large  number  of  independent  variables,  it  is  sometimes  suggested  that  a  modi¬ 
fied  measure  be  used  which  recognizes  the  number  of  independent  variables  in  the  model. 
This  adjusted  coefficient  of  multiple  determination,  denoted  by  R2,  is  defined: 


SSE 


(7.33) 


4=1- 


n-p  SSTO 


242  /  Multiple  regression — I 


This  adjusted  coefficient  of  multiple  determination  may  actually  become  smaller  when 
another  independent  variable  is  introduced  into  the  model,  because  the  decrease  in  SSE 
may  be  more  than  offset  by  the  loss  of  a  degree  of  freedom  in  the  denominator  n  —  p. 

Coefficient  of  multiple  correlation 

The  coefficient  of  multiple  correlation  R  is  the  positive  square  root  of  R2: 
(7.34)  R  =  VR2 

It  equals  in  absolute  value  the  correlation  coefficient  r  in  (3.71)  for  simple  correl¬ 
ation  when  p  —  1  =  1,  i.e.,  when  there  is  one  independent  variable  in  model 
(7.18). 

Note 

From  now  on,  we  shall  call  r  the  coefficient  of  simple  correlation  to  distinguish  it  from 
the  coefficient  of  multiple  correlation. 


7.5  INFERENCES  ABOUT  REGRESSION  PARAMETERS 

The  least  squares  estimators  in  b  are  unbiased: 

(7.35)  E(b)  =  P 

The  variance-covariance  matrix  a2(b): 


cr2(^o) 

<7(b0,  h) 

a(b0,  bp- 

a-(bu  b0) 

cr2(b ,) 

a(bu  bp- 

(7.36)  a2(b)  = 

4 

4 

is  given  by: 

cr(bp- 1,  bo) 

’  cr2{bp- 1) 

(7.37) 

a2(b)  =  cr2(X'X)-1 

pXp 

The  estimated  variance-covariance  matrix  s2(b): 

s2(b0) 

s(b0,  bx) 

in 

O 

! 

s(bu  b0) 

s2{b\) 

s(bu  bp^ i) 

(7.38)  s2(b)  = 

4 

4 

4 

is  given  by: 

_s(bp- 1,  b,jl 

s(bp- ubi) 

s\bp- 1) 

(7.39) 

s2(b) 

pXp 

=  MSEiX'X)-1 

From  s2(b),  one  can  obtain  s2(b0),  s2{bi)  or  whatever  other  variance  is  needed,  or 
any  needed  covariances. 


7.5  Inferences  about  regression  parameters  /  243 


Interval  estimation  of  (3k 

For  the  normal  error  model  (7.18),  we  have: 

(7.40)  ■■  ~  f*  ~  tin  -  p )  k  =  0,  1 

s(bk) 

Hence,  the  confidence  limits  for  (3k  with  1  —  a  confidence  coefficient  are: 

(7.41)  bk  ±  t(  1  ~  all’,  n  -  p)s{bk) 


Tests  for  (ik 


Tests  for  (3k  are  set  up  in  the  usual  fashion.  To  test: 


(7.42a) 


H0  :  fa  =  0 
Ha :  (3k^0 


we  may  use  the  test  statistic: 

bk 

(7.42b)  t*  =  — y- 

s(bk ) 


and  the  decision  rule: 


If  r*  <  til  —  all ;  n  —  p),  conclude  H0 
(7.42c)  i  i  v 

Otherwise  conclude  Ha 

The  power  of  the  t  test  can  be  obtained  as  explained  in  Chapter  3 ,  with  the 
degrees  of  freedom  modified  to  n  —  p. 

As  with  simple  regression,  the  test  whether  or  not  (3k  =  0  in  multiple  regres¬ 
sion  models  can  also  be  conducted  by  means  of  an  F  test.  We  discuss  this  test  in 
Chapter  8. 


Joint  inferences 


The  boundary  of  the  joint  confidence  region  for  all  p  of  the  (3k  regression 
parameters  (k  =  0,  l, ...  ,p  —  1)  with  confidence  coefficient  1  —  a  is: 

(b  -  |3)'X'X(b  -  P) 


(7.43) 


pMSE 


F(  1  -  a;  p,n-  p) 


The  region  defined  by  this  boundary  is  generally  difficult  to  obtain  and  interpret. 

The  Bonferroni  joint  confidence  intervals,  on  the  other  hand,  are  easy  to 
obtain  and  interpret.  If  g  parameters  are  to  be  estimated  jointly  (where  g  <p), 
the  confidence  limits  with  family  confidence  coefficient  1  —  a  are: 


(7.44) 


bk  ±  Bs(bk ) 


where: 


(7.44a)  B  =  t(  1  —  allg;  n  —  p) 

In  Section  8.4,  we  discuss  tests  concerning  a  subset  of  the  regression  parame¬ 
ters. 


244  /  Multiple  regression — I 


7.6  INFERENCES  ABOUT  MEAN  RESPONSE 


Interval  estimation  of  E(Yh) 


For  given  values  of  Xx, . .  .  ,Xp-x,  denoted  by  Xhx, . . .  ,Xhp-X,  the  mean  re¬ 
sponse  is  denoted  by  E(Yh).  We  define  the  vector  Xh: 


(7.45) 


X, 


1 

Xhi 

Xh2 


X, 


h,p—  1 


so  that  the  mean  response  to  be  estimated  is: 


(7.46)  E(Yh)  =  X;p 

The  estimated  mean  response  corresponding  to  Xh,  denoted  by  Yh,  is: 

(7.47)  Yh  =  X'hb 


This  estimator  is  unbiased: 

(7.48)  E(Yh)  =  X'h$  =  E{Yh) 
and  its  variance  is: 

(7.49)  crHf,)  =  cr2X;(X'X)_1Xft  =  x;,a2(b)X* 

Note  that  the  variance  cr2(Yh )  is  a  function  of  the  variances  <r2(bk )  of  the  regres¬ 
sion  coefficients  and  of  the  covariances  cr{bk,  bv)  between  pairs  of  regression 
coefficients,  just  as  in  simple  linear  regression.  The  estimated  variance  s \Yh)  is 
given  by: 

(7.50)  s2(fh)  =  MS£(X£(X'X)-%)  =  X^s2(b)Xft 
The  1  —  a  confidence  limits  for  E(Yh)  are: 

(7.51)  Yh±t(l-a/2-n-p)s(Yh) 


Confidence  region  for  regression  surface 

The  1  —  a  confidence  region  for  the  entire  regression  surface  is  an  extension 
of  the  Working-Hotelling  confidence  band  for  the  regression  line  when  there  is 
one  independent  variable.  Boundary  points  of  the  confidence  region  at  Xh  are: 

(7.52)  Yh  ±  Ws(Yh) 

where: 

(7.52a)  W2  =  pF(  1  -  a;  p,n-p) 

The  confidence  coefficient  is  1  —  a  that  the  region  contains  the  entire  regression 
surface  over  all  combinations  of  real-numbered  values  of  the  X  variables. 


7.6  Inferences  about  mean  response  /  245 


Simultaneous  confidence  intervals  for  several  mean  responses 

When  it  is  desired  to  estimate  a  number  of  mean  responses  E(Yh)  correspond¬ 
ing  to  different  Xh  vectors,  one  can  employ  two  basic  approaches: 

1 .  Use  the  Working-Hotelling  type  confidence  region  bounds  from  (7.52)  for 
the  several  Xh  vectors  of  interest.  Since  these  bounds  cover  the  mean  responses 
for  all  possible  Xh  vectors  with  confidence  coefficient  1  —  a,  they  will  cover  the 
mean  responses  for  selected  Xh  vectors  with  confidence  coefficient  greater  than 
1  —  a. 

2.  Use  Bonferroni  simultaneous  confidence  intervals.  When  g  statements  are 
to  be  made  with  family  confidence  coefficient  1  —  a,  the  Bonferroni  confidence 
limits  are: 

(7.53)  Yh  ±  Bs(Yh) 

where: 

(7.53a)  B  =  t(  1  —  a/2 g;  n  —  p) 

For  any  particular  application,  one  should  compare  W  and  B  to  see  which 
procedure  will  lead  to  narrower  confidence  intervals.  If  the  Xh  levels  are  not 
specified  in  advance  but  are  determined  as  the  analysis  proceeds,  it  is  better  to 
use  the  Working-Hotelling  type  limits  (7.52). 


F  test  for  lack  of  fit 

To  test  whether  the  response  function: 

(7.54)  E(Y)  =  j30  +  (3.X,  +  •  •  •  +  (3p^Xp^ 

is  an  appropriate  response  surface  for  the  data  at  hand  requires  repeat  observa¬ 
tions,  as  for  simple  regression  analysis.  Repeat  observations  in  multiple  regres¬ 
sion  are  replicate  observations  on  Y  corresponding  to  levels  of  each  of  the  X 
variables  which  are  constant  from  trial  to  trial.  Thus,  with  two  independent 
variables  repeat  observations  require  that  X\  and  X2  each  remain  at  given  levels 
from  trial  to  trial. 

The  procedures  described  in  Chapter  4  for  the  F  test  for  lack  of  fit  are  applica¬ 
ble  to  multiple  regression.  Once  the  ANOVA  table,  shown  in  Table  7. 1 ,  has  been 
obtained,  SSE  is  decomposed  into  pure  error  and  lack  of  fit  components.  The 
pure  error  sum  of  squares  SSPE  is  obtained  by  first  calculating  for  each  replicate 
group  the  sum  of  squared  deviations  of  the  Y  observations  around  the  group 
mean,  where  a  replicate  group  has  the  same  values  for  the  A  variables.  Suppose 
there  are  c  replicate  groups  with  distinct  sets  of  levels  for  the  X  variables,  and  let 
the  mean  of  the  Y  observations  for  the  yth  group  be  denoted  by  Yr  Then  the  sum 
of  squares  for  the yth  group  is  given  by  (4.8),  and  the  pure  error  sum  of  squares  is 
the  sum  of  these  sums  of  squares,  as  given  by  (4.9).  The  lack  of  fit  sum  of 
squares  SSLF  equals  the  difference  SSE  —  SSPE,  as  indicated  by  (4.12). 

The  number  of  degrees  of  freedom  associated  with  SSPE  is  n  —  c,  and  the 


246  /  Multiple  regression — I 


number  of  degrees  of  freedom  associated  with  SSLF  is  ( n  —  p )  — 
(n  -  c)  =  c  —  p. 

The  F  test  is  conducted  as  described  in  Chapter  4,  but  with  the  degrees  of 
freedom  modified  to  those  just  stated. 

7.7  PREDICTIONS  OF  NEW  OBSERVATIONS 

Prediction  of  new  observation  T*(new) 

The  prediction  limits  with  1  —  a  confidence  coefficient  for  a  new  observation 
Fft(new)  corresponding  to  Xh,  the  specified  values  of  the  X  variables,  are: 

(7.55)  Yh±t(  1  -  a/2;  n  -  p)s(Yh(n&w}) 

where: 

(7.55a)  s2(Yk  new))  =  MSE  +  s2(fh)  =  MSE  +  X£s2(b)X, 

=  MSE{  1  +  x;(X'X)-%) 


Prediction  of  mean  of  m  new  observations  at  Xh 

When  m  new  observations  are  to  be  selected  at  Xh  and  their  mean  T/,(neW)  is  to 
be  predicted,  the  1  —  a  prediction  limits  are: 

(7.56)  Yh  ±  t(l  -  a/2;  n  -  p)s(YKasw)) 

where: 

MSE  .  ,  MSE 

(7.56a)  s2(Yh{new) )  = - +  s2(Yh)  = - +  X/;s2(b)X/; 

m  m 

=  MSEi  —  +  X£(X'X)_1Xa 
y  m 

Predictions  of  g  new  observations 

Simultaneous  prediction  limits  for  g  new  observations  at  g  different  levels  of 
Xh  with  family  confidence  coefficient  1  —  a  are  given  by: 

(7-57)  Yh  ±  Ss(Yh(new)) 

where: 

(7.57a)  S2  =  gF(  1  -  a;  g,n-  p) 

and  s2(17(new))  is  given  by  (7.55a). 

Alternatively,  the  Bonferroni  simultaneous  prediction  limits  can  be  used.  For 
g  predictions  with  a  1  —  a  family  confidence  coefficient,  they  are: 

Yh  ±  Bs(Yh(new}) 


(7.58) 


7.8  An  example — Multiple  regression  with  two  independent  variables  /  247 


where: 

(7.58a)  B  =  t(  1  -  a/2g;  n  -  p) 

A  comparison  of  S  and  B  in  advance  of  any  particular  use  will  indicate  which 
procedure  will  lead  to  narrower  prediction  intervals. 

7.8  AN  EXAMPLE— MULTIPLE  REGRESSION 
WITH  TWO  INDEPENDENT  VARIABLES 

In  this  section,  we  shall  develop  a  multiple  regression  application  with  two 
independent  variables.  We  shall  illustrate  a  number  of  different  types  of  infer¬ 
ences  which  might  be  made  for  this  application  but  will  not  take  up  every  possi¬ 
ble  type  of  inference. 


Setting 

The  Zarthan  Company  sells  a  special  skin  cream  through  fashion  stores  exclu¬ 
sively.  It  operates  in  15  marketing  districts  and  is  interested  in  predicting  district 
sales.  Table  7.2  contains  data  on  sales  by  district,  as  well  as  district  data  on  target 
population  and  per  capita  discretionary  income.  Sales  are  to  be  treated  as  the 
dependent  variable  Y,  and  target  population  and  per  capita  discretionary  income 
as  independent  variables  Xj  and  X2,  respectively,  in  an  exploration  of  the  feasi¬ 
bility  of  predicting  district  sales  from  target  population  and  per  capita  discretion¬ 
ary  income.  The  first-order  model: 

(7.59)  Yi  =  (30  +  faXn  +  (32Xi2  +  e( 

with  normal  error  terms  is  expected  to  be  appropriate. 

TABLE  7.2  Basic  data — Zarthan  Company  example 


District 

i 

Sales 

( gross  of  jars; 

1  gross  =12  dozen ) 

Yt 

Target  Population 
(thousands  of  persons) 

Xn 

Per  Capita 
Discretionary 
Income  ( dollars ) 

X 12 

i 

162 

274 

2,450 

2 

120 

180 

3,254 

3 

223 

375 

3,802 

4 

131 

205 

2,838 

5 

67 

86 

2,347 

6 

169 

265 

3,782 

7 

81 

98 

3,008 

8 

192 

330 

2,450 

9 

116 

195 

2,137 

10 

55 

53 

2,560 

11 

252 

430 

4,020 

12 

232 

372 

4,427 

13 

144 

236 

2,660 

14 

103 

157 

2,088 

15 

212 

370 

2,605 

248  /  Multiple  regression — I 


Basic  calculations 


The  Y  and  X  matrices  for  the  Zarthan  Company  illustration  are  shown 
Table  7.3.  We  shall  require: 


1. 


X'X 


1 

274 


1 

180 


2,450  3,254 


T 

370 

2,605 


1  274  2,450 
1  180  3,254 


1  370  2,605 


which  yields: 
(7.60) 

2. 


X'X 


15 

3,626 


3,626 

1,067,614 


44,428 

11,419,181 


44,428  11,419,181  139,063,428 


X'Y  = 


1 

274 

2,450 


1 

180 

3,254 


1 

370 

2,605 


162 

120 


212 


which  yields: 


(7.61) 


X'Y 


2,259 

647,107 

7,096,619. 


TABLE  7.3  Y  and  X  matrices — Zarthan  Company  example 


162 

1 

274 

2,450 

120 

1 

180 

3,254 

223 

1 

375 

3,802 

131 

1 

205 

2,838 

67 

1 

86 

2,347 

169 

1 

265 

3,782 

81 

1 

98 

3,008 

192 

X  = 

1 

330 

2,450 

116 

1 

195 

2,137 

55 

1 

53 

2,560 

252 

1 

430 

4,020 

232 

1 

372 

4,427 

144 

1 

236 

2,660 

103 

1 

157 

2,088 

212 

1 

370 

2,605 

in 


7.8  An  example — Multiple  regression  with  two  independent  variables  /  249 


3. 


15 

3,626 

44,428" 

-1 

(X'X)-1  = 

3,626 

1,067,614 

11,419,181 

44,428 

11,419,181 

139,063,428 

(6.25), 

we  define: 

a 

=  15 

b  = 

3,626 

c  =  44,428 

d 

=  3,626 

e 

1,067,614 

f=  11,419,181 

8 

=  44,428 

h  = 

11,419,181 

k=  139,063,428 

so  that: 


Z  -  14,497,044,060,000 
A  =  1.246348416 
B  =  .0002129664176 


and  so  on.  We  obtain: 

(7.62)  (X'X)'1  = 

1.2463484  2.1296642E-  4  -4.1567125E  -  4 " 

2.1296642E  —  4  7.7329030E  -  6  -7.0302518E  -  7 

4.1567125E  -  4  -7.0302518E  -  7  1.9771851E  -  7 

Note  that  some  of  the  results  in  the  (X'X)--1  matrix  are  given  in  the  E  format, 
where,  say,  E  —  4  stands  for  10~4  =  1/104.  Thus,  2.1296642E  —  4  stands  for 
.00021296642. 


Algebraic  equivalents.  Note  that  X'X  for  the  first-order  model  (7.59)  with 
two  independent  variables  is: 


X'X 


1  1 

Xn  X2l 

X\2  X22 


1 

Xnl 

Xn2 


1  xn  X12 
1  X21  X22 


1 


xnl  X, 


n2 


or: 


n 

2Xn 

2X,-2 

(7.63) 

X'X  = 

2Xn 

2X?, 

2  XnX, 

2  Xl2 

2  XaXa 

2Xl 

Thus,  for  our  example: 


n  =  15 

2Xfl  =  274  +  180  +  •  •  •  =  3,626 
XXnXa  =  274(2,450)  +  180(3,254)  +  •  •  •  =  11,419,181 
etc. 


250  /  Multiple  regression — I 


These  elements  are  found  in  (7.60). 

Also  note  that  X'Y  for  the  first-order  model  with  two  independent  variables 
is: 


1 

1 

•  1 

V 

sr* 

(7.64) 

X'Y  = 

Xn 

X21 

••  xnl 

*2 

= 

Y, 

X\2 

X22 

••  Xn2 

Yn_ 

zxl2r. 

For  our  example,  we  have: 

2Yi=  162  +  120  +  •••  =2,259 
2XnYi  =  274(162)  +  180(120)  +  •  •  •  =  647,107 
ZXi2Yi  =  2,450(162)  +  3,254(120)  +  •  •  •  =  7,096,619 

These  are  the  elements  found  in  (7.61). 


Estimated  regression  function 


The  least  squares  estimates  b  are  readily  obtained  by  (7.21),  given  our  basic 
calculations  in  (7.61)  and  (7.62): 


b  =  (X'X)_1X'Y 


1.2463484 
2.1296642E  -  4 
— 4.1567125E  -  4 


2. 1296642E  -  4  -4. 1567125E 

7.7329030E  -  6  -7.0302518E 

— 7.0302518E  —  7  1.9771851E 


4 

7 

7 


3.4526127900 

.4960049761 

.009199080867 


x 


2,259 

647,107 

7,096,619 


Thus: 


bo 

3.4526127900 

b\ 

= 

.4960049761 

P2_ 

.009199080867 

and  the  estimated  regression  function  is: 


Y  =  3.453  +  .496*!  +  ,00920X2 


This  estimated  regression  function  indicates  that  mean  sales  are  expected  to 
increase  by  .496  gross  when  the  target  population  increases  by  one  thousand, 
holding  per  capita  discretionary  income  constant,  and  that  mean  sales  are  ex- 


7.8  An  example — Multiple  regression  with  two  independent  variables  /  251 


pected  to  increase  by  .0092  gross  when  per  capita  discretionary  income  increases 
by  one  dollar,  holding  population  constant. 

Algebraic  version  of  normal  equations.  The  normal  equations  in  algebraic 
form  for  the  case  of  two  independent  variables  can  be  obtained  readily  from 
(7.63)  and  (7.64).  We  have: 

(X'X)b  =  X'Y 


n  SX/i 

2Xa  n 

bo 

SXi,  2Xft 

SXijXn 

b\ 

= 

sx„y;. 

SXi2  SXnX,! 

sx?2 

J>2_ 

SXnX, 

from  which  we  obtain  the  normal  equations: 

SF;  =  nb0  +  bx2XiX  +  b2^Xi2 
(7.65)  XXnYi  =  b<$Xn  +  bxZXfx  +  b^XiXXi2 

2XaYi  =  bo%Xi2  +  biSXnXa  +  b22Xj2 


Analysis  of  aptness  of  model 

To  examine  the  aptness  of  regression  model  (7.59)  with  independent  variables 
X\  and  X2  for  the  data  at  hand,  we  require  the  fitted  values  F,  and  the  residuals 
et  =  Yi  —  F/.  We  obtain  by  (7.23): 

Y  =  Xb 


~*1~ 

'1 

274  2,450" 

3.4526127900 

"161.896" 

Y2 

1 

180  3,254 

.4960049761 

122.667 

Yl5_ 

1 

370  2,605 

.009199080867 

210.938 

Further,  by  (7.24)  we  find: 

e  =  Y  -  Y 


ei 

162 

"161.896" 

.104" 

e2 

— 

120 

_ 

122.667 

— 

-2.667 

f\5_ 

2i  2 

210.938 

1.062 

Figures  7.6,  7.7,  and  7.8  contain  plots  of  the  residuals  et  against  the  fitted 
values  Yu  against  Xix,  and  against  Xi2,  respectively.  These  plots  were  generated 
by  the  BMDP  computer  package.  There  are  no  suggestions  in  any  of  these  plots 
that  systematic  deviations  from  the  fitted  response  plane  are  present,  nor  that  the 
error  variance  varies  either  with  the  level  of  F  or  with  the  levels  of  Xx  or  X2.  We 
do  not  show  a  normal  probability  plot,  but  it  does  not  indicate  any  major  depar¬ 
ture  from  normality.  Hence,  model  (7.59)  appears  to  be  apt  for  this  application. 


252  /  Multiple  regression — I 


FIGURE  7.6  Residual  plot  against  Y—  Zarthan  Company  example 


1 

3.00  + 

+ 

2.25 

+ 

1 

1 

4- 

1 . 50 

+ 

1 

+ 

1 

1 

.750 

+ 

+ 

0.00 

+ 

1 

+ 

1 

-.750 

+ 

1 

1 

+ 

1 

o 

i 

+ 

1 

1 

+ 

-2.25 

+ 

1 

+ 

-3.00 

+ 

+ 

-3.75 

+ 

1 

+ 

75. 

50.  100 

+  +  + 

125  175  225 

150  200  250 

275 

PREDICTD 


Analysis  of  variance 

To  test  whether  sales  are  related  to  population  and  per  capita  discretionary 
income,  we  construct  the  ANOVA  table  in  Table  7.4  (p.  255).  The  basic  quanti¬ 
ties  needed  are: 


Y'Y 


[162 


120  •  •  •  212] 


162 

120 


[212J 

=  (162)2  +  (120)2  +  •  •  ■  +  (212)2 
=  394,107.000 


7.8  An  example — Multiple  regression  with  two  independent  variables  /  253 


FIGURE  7.7  Residual  plot  against  X , — Zarthan  Company  example 


3.00 


2.25 


1 .50 


.750 


R 

E 

S  0.00 
I 

D 

U 

A 

L  -.750 


-1.50 


-2.25 


-3.00 


-3.75 


120  200  280  360  440 

80.  160  240  320  400 

TARGET 


n 


Y'll'Y 


15 


[162  120  • 

■  212] 

T 

[1  1 

...  !] 

162 

l 

120 

1 

212 

=  —  CZY^XYd 

n 


(2,259 f 
15 


340,205.400 


Thus: 


SSTO  =  Y'Y 


1 

—  Y'll'Y  =  394,107.000  -  340,205.400  =  53,901.600 


254  /  Multiple  regression — I 


FIGURE  7.8  Residual  plot  against  X2 — Zarthan  Company  example 


3.00 

2.25 

1.50 

.750 

R 

E 

S  0.00 
! 

0 

U 

A 

L  -.750 

-1.50 

-2.25 

-3.00 

-3.75 


2250  2750  3250  3750  4250 

2500  3000  3500  4000  4500 

I NCOME 

and  using  our  result  in  (7.61): 

SSE  =  Y'Y  -  b'X'Y 

=  394,107.000  -  [3.4526127900  .4960049761  .009199080867] 

2,259 
X  647,107 
7,096,619 

=  394,107.000  -  394,050.116  =  56.884 

Finally,  we  obtain  by  subtraction: 

SSR  =  SSTO  -  SSE  =  53,901.600  -  56.884  =  53,844.716 

The  degrees  of  freedom  and  mean  squares  are  entered  in  Table  7.4.  Note  that 
three  regression  parameters  had  to  be  estimated;  hence,  15  —  3  =  12  degrees  of 


7.8  An  example — Multiple  regression  with  two  independent  variables  /  255 


TABLE  7.4  ANOVA  table — Zarthan  Company  example 


Source  of 

Variation  SS  df  MS 


Regression  SSR  =  53,844.716  2  MSR  =  26,922.358 

Error  SSE=  56.884  12  MSE=  4.740 


Total  SSTO  —  53,901.600  14 


freedom  are  associated  with  SSE.  Also,  the  number  of  degrees  of  freedom  associ¬ 
ated  with  SSR  are  two — the  number  of  X  variables  in  the  model. 


Test  of  regression  relation.  To  test  whether  sales  are  related  to  population 
and  per  capita  discretionary  income: 

H0 :  fii  =  0  and  (32  =  0 

Ha:  not  both  and  (32  equal  0 

we  use  test  statistic  (7.30b): 


MSR 

MSE 


26,922.358 

4.740 


5,680 


Assuming  a  is  to  be  .05,  we  require  F(.95;  2,  12)  =  3.89.  Since  F*  — 
5,680  >  3.89,  we  conclude  Ha,  that  sales  are  related  to  population  and  per 
capita  discretionary  income.  Whether  this  relation  is  useful  for  making  predic¬ 
tions  of  sales  or  estimates  of  mean  sales  still  remains  to  be  seen. 

The  P- value  for  this  test  is  less  than  .001  since  we  note  from  Table  A-4  that 
F(.999;  2,  12)  =  13.0.  In  fact,  it  can  be  shown  that  the  P-value  is  0+. 


Coefficient  of  multiple  determination.  For  our  example,  we  have  by 
(7.31): 


9  SSR  53,844.716 

R2  = - =  — : - - —  =  .9989 

SSTO  53,901.600 

Thus,  when  the  two  independent  variables  population  and  per  capita  discretion¬ 
ary  income  are  considered,  the  variation  in  sales  is  reduced  by  99.9  percent. 


Algebraic  expression  for  SSE.  The  error  sum  of  squares  for  the  case  of  two 
independent  variables  in  algebraic  terms  is: 


SSE  =  Y'Y 


b'X'Y  =  2F?  -  [b0  h  b2] 


SF, 

2XnF,- 

ZXnYi 


or: 


(7.66) 


SSE  =  SF?  -  b0XYi  -  b^XnYi  -  b22Xi2Yi 


256  /  Multiple  regression — I 


Note  how  this  expression  is  a  straightforward  extension  of  (2.24a)  for  the  case  of 
one  independent  variable. 

Estimation  of  regression  parameters 

The  Zarthan  Company  is  not  interested  in  the  parameter  /30  since  it  falls  far 
outside  the  scope  of  the  model.  It  is  desired  to  estimate  /3j  and  /32  jointly  with  a 
family  confidence  coefficient  .90.  We  shall  use  the  simultaneous  Bonferroni 
confidence  limits  in  (7.44),  since  these  are  easy  to  develop  and  interpret. 
First,  we  need  the  estimated  variance-covariance  matrix  s2(b): 

s2(b)  =  MSE(X'Xyl 

MSE  is  given  in  Table  7.4,  and  (X'X)-1  was  obtained  in  (7.62).  Hence: 
(7.67) 

s2(b)  =  4.7403 

1.2463484  2.1296642E-  4  -4.1567125E  -  4 

X  2.1296642E  —  4  7.7329030E  -  6  -7.0302518E  -  7 

-4.1567125E  -  4  -7.0302518E  -  7  1.9771851E  -  7_ 

5.9081  1.0095E  —  3  -1.9704E-3~ 

1.0095E  —  3  3.6656E  —  5  -3.3326E  -  6 

_  — 1.9704E  —  3  — 3.3326E  —  6  9.3725E  -  7_ 

The  two  elements  we  require  are: 

s2(b j)  =  .000036656  or  s^)  =  .006054 

s\b2)  =  .00000093725  or  s(b2)  =  .0009681 

Next,  we  require  for  g  =  2  simultaneous  estimates: 

B  =  t{ 1  -  .10/2(2);  12)  =  t(.975;  12)  =  2.179 

Now  we  are  ready  to  obtain  the  two  simultaneous  confidence  intervals: 

.4960  -  2. 179 (.006054)  <  ft  <  .4960  +  2.179(.006054) 

or: 

.483  <  jSj  <  .509 

.009199  -  2. 179(. 0009681)  <  fc  <  .009199  +  2. 179(. 0009681) 
or: 

.0071  <  p2  <  .0113 

With  family  confidence  coefficient  .90,  we  conclude  that  Pi  falls  between  .483 
and  .509  and  that  fi2  falls  between  .0071  and  .0113. 

Note  that  the  simultaneous  confidence  intervals  suggest  that  both  fii  and  /32 


7.8  An  example — Multiple  regression  with  two  independent  variables  /  257 

are  positive,  which  is  in  accord  with  theoretical  expectations  that  sales  should 
increase  with  either  higher  target  population  or  higher  per  capita  discretionary 
income,  the  other  variable  being  held  constant. 


Estimation  of  mean  response 


Suppose  the  Zarthan  Company  would  like  to  estimate  expected  (mean)  sales 
in  a  district  with  target  population  Xh  x  =  220  thousand  persons  and  per  capita 
discretionary  income  Xh2  =  2,500  dollars.  We  define: 


X*  = 


1 

220 

2,500 


The  point  estimate  of  mean  sales  is  by  (7.47): 


Yh  =  X'hb  =  [1 


220 


2,500] 


3.4526 

.4960 

.009199 


135.57 


The  estimated  variance  by  (7.50)  and  using  the  results  in  (7.67)  is: 
s2(Yh)  =  X’hs2(b)Xh 

=  [1  220  2,500] 


5.9081  1.0095E  -  3  -1.9704E  -  3“ 

f 

X 

1.0095E  —  3  3.6656E  —  5  -3.3326E  -  6 

220 

—  1.9704E  —  3  — 3.3326E  —  6  9.3725E  -  7 

2,500 

=  .46638 


or: 

s(Yh)  =  .68292 

Assume  that  the  confidence  coefficient  for  the  interval  estimate  of  E(Yh)  is  to 
be  .95.  We  then  need  t(. 975;  12)  =  2.179,  and  obtain  by  (7.51): 

135.57  -  2.179(.68292)  <  E(Yh)  <  135.57  +  2. 179(.68292) 


or: 


134.1  <E(Yh)  <  137.1 

Thus,  with  confidence  coefficient  .95,  we  estimate  that  mean  sales  in  a  district 
with  target  population  of  220  thousand  and  per  capita  discretionary  income  of 
$2,500  are  somewhere  between  134.1  and  137.1  gross. 


Algebraic  version  of  estimated  variance  s1{Yh).  Since  by  (7.50): 

s2(Yk)  =  XJs2(b)X* 


258  /  Multiple  regression — I 


it  follows  for  the  case  of  two  independent  variables: 

(7.68)  sHh)  =  s2(b0 )  +  XhsHh)  +  X2h2s2(b2)  +  2Xhls(ba,  b ,) 

+  2X)l1s< b,j,  b2)  +  2X i,tX)l2s( bt .  b>) 

When  we  substitute  in  (7.68),  utilizing  the  estimated  variances  and  covariances 
from  (7.67),  we  obtain  the  same  result  as  before,  namely,  s2{Yh)  =  .46638. 


Prediction  limits  for  new  observations 

Suppose  the  Zarthan  Company  would  like  to  predict  sales  in  two  districts.  The 
two  districts  have  the  following  characteristics: 

District  A  District  B 


Xhl  220  375 

Xh2  2,500  3,500 

To  determine  which  simultaneous  prediction  intervals  are  best  here,  we  shall 
find  S  as  given  in  (7.57a)  and  B  as  given  in  (7.58a)  for  g  =  2,  assuming  the 
family  confidence  coefficient  is  to  be  .90: 

S2  =  2F(.90;  2,  12)  =  2(2.81)  =  5.62 


or: 


5  =  2.37 

and 

B  =  t(l  -  .10/2(2);  12)  =  t(.915;  12)  =  2.179 

Hence,  the  Bonferroni  limits  are  more  efficient  here. 

For  district  A,  we  shall  use  the  results  we  found  when  estimating  mean  sales, 
since  the  levels  of  the  independent  variables  are  the  same  as  before.  We  have 
from  earlier: 

Ya  =  135.57  s2(fA)  =  -46638  MSE  =  4.7403 
Hence,  by  (7.55a): 

•s2(lA(new))  =  MSE  +  s2(Ya)  =  4.7403  +  .46638  =  5.20668 


or: 


‘S'O'A(new))  —  2.28182 
In  similar  fashion,  we  obtain: 

7b  =  221.65  s(Yb{  new))  =  2.34536 

We  found  before  that  5  =  2.179.  Hence,  the  simultaneous  Bonferroni  prediction 
intervals  with  family  confidence  coefficient  .90  are  by  (7.58): 

135.57  -  2.179(2.28182)  <  7A(new)  <  135.57  +  2.179(2.28182) 


7.8  An  example — Multiple  regression  with  two  independent  variables  /  259 


or: 


130.6  <  yA(new)  <  140.5 

221.65  -  2.179(2.34536)  <  FB(new)  ^  221.65  +  2.179(2.34536) 
or: 

216.5  <  1b (new)  ^  226.8 

With  family  confidence  coefficient  .90,  we  predict  that  sales  in  the  two  districts 
will  be  within  the  indicated  limits.  The  Zarthan  Company  considers  these  predic¬ 
tion  limits  sufficiently  precise,  and  hence  useful. 


Computer  printout 

Figure  7.9  contains  an  illustrative  computer  printout  for  the  Zarthan  Company 
example,  obtained  by  using  the  GLM  (general  linear  model)  program  of  the  SAS 
(Statistical  Analysis  System)  computer  package  (Ref.  7.1).  Regression  analysis 
printouts  differ  in  format  from  one  computer  program  to  another,  as  may  be  seen 
by  comparing  the  output  in  Figure  7.9  with  other  outputs  presented  in  earlier 
chapters.  However,  the  basic  information  presented  in  the  different  outputs  is 
essentially  the  same  for  the  major  statistical  regression  packages. 

We  have  annotated  the  output  in  Figure  7.9  to  tie  in  with  the  notation  of  this 
book.  The  first  two  blocks  of  information  contain  intermediate  regression  analy¬ 
sis  results  in  matrix  form,  specifically  the  X'X  and  (X'X)-1  matrices.  The  label 
“intercept”  in  these  matrices  refers  to  Xi0=  1  in  the  alternative  regression 
model  (7.7b). 

The  next  block  presents  information  about  the  estimated  regression  coeffi¬ 
cients  bk.  Shown,  in  turn,  are  the  estimates  bk,  the  test  statistics  t*  =  bk/s(bk)  for 
testing  whether  or  not  fik  =  0,  the  two-sided  P-values  for  the  test  statistics,  and 
the  estimated  standard  deviations  s(hk). 

The  fourth  block  contains  ANOVA  information:  the  ANOVA  table,  the  F* 
value  for  the  test  of  whether  or  not  a  regression  relation  exists,  the  P- value  for 
this  test,  VMSE,  and  R2. 

The  final  block  shows  the  observed  values  Yt,  the  fitted  values  Yh  and  the 
residuals  et. 

Because  of  rounding,  some  results  in  Figure  7.9  do  not  coincide  precisely 
with  the  corresponding  results  given  earlier.  In  this  connection,  it  should  be 
noted  that  different  computer  regression  packages  may  lead  to  somewhat  differ¬ 
ent  results  because  final  results  are  rounded  to  different  extents,  and  even  more 
importantly  because  rounding  errors  are  not  handled  equally  well  by  all  pack¬ 
ages.  Particularly  when  there  are  a  number  of  independent  variables,  some  of 
which  are  highly  correlated,  rounding  errors  can  be  a  serious  source  of  difficulty. 
It  is  a  wise  policy  to  investigate  a  computer  regression  package  before  using  it, 
for  instance,  by  comparing  its  output  for  a  test  problem  against  results  known  to 
be  accurate. 


260  /  Multiple  regression — I 


FIGURE  7.9  Computer  printout  for  Zarthan  Company  example  (SAS,  Ref.  7.1) 


the  x*x  matrix 


INTERCEPT 

INTERCEPT'£~~*0  !5.00 

TARGTp-«-Ai  3626.00 

INCOME-*—  y  ‘*‘*426.00 

a2 


tarstp  income 


3626.00  44426.00 

1067614.00  11419161.00 

11419161.00  139063426.00 


x*x  inverse  matrix 


intercept 

TARGTp 

INCOME 


INTERCEPT 

1.24634642 

0.00021297 

0.00041567 


TARGTP 

0.00021297 

0.00000773 

0.00000070 


INCOME 

0.00041567 

0.00000070 

0.00000020 


PARAMETER 

intercept 

TARGTP 

INCOME 


estimate 

3.45261279 

0,49600496 

0,00919906 

t 

bL 


SOURCE 

MODEL 

ERROR 

corrected  TOTAL 

One  sided 
P-value 

PR  >  F 


u 


'k 

OF 

2 

12 

14 


T  FOR  H01 
PARAMETERS 


PR  >  HI 


0.1609 

0.0001 
0,0001 
4 

a*  -  h  /~fh  \  Two-sided 

p.va,ue 


1.42 

61.92 

9.50 

t 


STD  ERROR  OF 
estimate 

2.43065049 

0.00605444 

0.00096611 


f 

s(b„) 


SS/?H 

SSE- 


SUM  OF  SQUARES  MEAN  SQUARE 
'53644. 7i643444/WS/?-»-26922. 35621722 
— ►56.86356556  MSE— ►4,74029713 


53901.60000000 

SSTO 


O.OOOl 
STO  DEV 


R.SQUARE 

0,998945- 


R2 


F  VALUE 
-5679.47 


2.17722234-*- 

— VMSE 

Yi 

A 

*1 

ei 

observation 

OBSERVED 

PREDICTED 

RESIDUAL 

VALUE 

VALUE 

1 

162.00000000 

161.89572437 

0,10427563 

2 

120 . 00U00000 

122,66731763 

•2.66731763 

3 

223.00000000 

224,42936429 

•1.42936429 

4 

131.00000000 

131.24062439 

-0,24062439 

5 

67.00000000 

67,69928353 

•0.69928353 

6 

169,00000000 

169,68465530 

•0.66485530 

7 

81.00000000 

79,73193570 

1.26806430 

8 

192.00000000 

189.67200303 

2,32799697 

9 

116.00000000 

119,63201695 

•3.83201895 

10 

55,00000000 

53,29052354 

1.70947646 

11 

252,00000000 

253,71505760 

-1.71505760 

12 

232,00000000 

228.69079490 

3,30920510 

13 

144,00000000 

144,97934226 

•0.97934226 

14 

103,00000000 

100.53307489 

2.46692511 

15 

212,00000000 

210,93605961 

1,06194039 

7.9  Standardized  regression  coefficients  /  261 


Caution  about  hidden  extrapolations 

Before  concluding  this  illustration  of  multiple  regression  analysis,  we  should 
caution  again  about  making  estimates  or  predictions  outside  the  scope  of  the 
model.  The  danger,  of  course,  is  that  the  model  may  not  be  appropriate  when 
extended  outside  the  region  of  the  observations.  In  multiple  regression,  it  is 
particularly  easy  to  lose  track  of  this  region  since  the  levels  of  Xi, . . .  ,Xp-i 
jointly  define  the  region.  Thus,  one  cannot  merely  look  at  the  ranges  of  each 
independent  variable.  Consider  Figure  7.10,  where  the  shaded  region  is  the  re¬ 
gion  of  observations  for  a  multiple  regression  application  with  two  independent 
variables.  The  circled  dot  is  within  the  ranges  of  the  independent  variables  Xx 
and  X2  individually,  yet  is  well  outside  the  joint  region  of  observations. 

FIGURE  7.10  Region  of  observations  on  Xx  and  X2  jointly,  compared  with 
ranges  of  X1  and  X2  individually 

x2i 


7.9  STANDARDIZED  REGRESSION  COEFFICIENTS 

Standardized  regression  coefficients  have  been  proposed  to  facilitate  compari¬ 
sons  between  regression  coefficients.  Ordinarily,  it  is  difficult  to  compare  regres¬ 
sion  coefficients  because  of  differences  in  the  units  involved.  We  cite  two  exam¬ 
ples. 

1.  When  considering  the  fitted  response  function: 

Y  =  200  +  20,000X1  +  ,2X2 

one  may  be  tempted  to  conclude  that  Ax  is  the  only  important  independent  varia¬ 
ble,  and  that  X2  has  little  effect  on  the  dependent  variable  Y.  A  little  reflection 


262  /  Multiple  regression — I 


should  make  one  wary  of  this  conclusion.  The  reason  is  that  we  do  not  know  the 
units  involved.  Suppose  the  units  are: 


Y  in  dollars 

Xi  in  thousand  dollars 

X2  in  cents 


In  that  event,  the  effect  on  the  mean  response  of  a  $  1 ,000  increase  in  Xi  when  X2 
is  constant  would  be  exactly  the  same  as  the  effect  of  a  $1,000  increase  in  X2 
when  Xi  is  constant,  despite  the  difference  in  the  regression  coefficients. 

2.  In  our  Zarthan  Company  example,  we  cannot  make  any  comparison  be¬ 
tween  b  i  and  b2  because  bx  is  in  units  of  one  gross  of  jars  per  thousand  persons 
while  b2  is  in  units  of  one  gross  of  jars  per  dollar  of  per  capita  discretionary 
income. 

Standardized  regression  coefficients,  also  sometimes  called  beta  coefficients, 
are  defined  as  follows: 

I" 

1 

w 

n  —  1 


(7.69)  B t 


sx 

SY 


n 


2  w- 


2(xlk-xk)2 

1 

2  (Yt-yf 


where  sx  and  sY  are  the  standard  deviations  of  the  X  and  Y  observations,  respec¬ 
tively.  The  effect  of  the  term  in  brackets  is  to  make  Bk  dimensionless.  The 
coefficient  Bk  reflects  the  change  in  the  mean  response  (in  units  of  standard 
deviations  of  Y)  per  unit  change  in  the  independent  variable  Xk  (in  units  of 
standard  deviations  of  Xk)  when  all  other  independent  variables  are  held  con¬ 
stant. 

For  our  Zarthan  Company  example,  we  obtain  the  following  standardized 
regression  coefficients,  using  the  final  expression  in  (7.69): 

“1 1/2 

191,089 


fli  =  .496 


Bo  =  .00920 


53,902 
7,473,616 
53,902 


=  .934 


1/2 


=  .108 


Sometimes,  these  standardized  regression  coefficients  are  interpreted  as 
showing  that  target  population  (Xi)  has  a  greater  impact  than  per  capita  discre¬ 
tionary  income  (X2)  on  sales  because  Bx  is  much  larger  thanS2.  However,  as  we 
will  see  in  the  next  chapter,  one  must  be  cautious  about  interpreting  regression 
coefficients,  whether  standardized  or  not.  The  reason  is  that  when  the  independ¬ 
ent  variables  are  correlated  among  themselves,  the  regression  coefficients  are 
affected  by  the  other  independent  variables  in  the  model.  We  shall  discuss  this 
problem  in  Chapter  8. 

Not  only  does  the  presence  of  correlations  among  the  independent  variables 
affect  the  magnitude  of  the  standardized  regression  coefficients  but  the  spacings 


Problems  /  263 


of  the  observations  on  the  independent  variables  also  affect  the  standardized 
regression  coefficients.  Sometimes  the  spacings  of  the  observations  on  the  inde¬ 
pendent  variables  may  be  quite  arbitrary. 

Hence,  it  is  ordinarily  not  wise  to  interpret  a  standardized  regression  coeffi¬ 
cient  as  reflecting  the  importance  of  the  independent  variable. 


7.10  WEIGHTED  LEAST  SQUARES 


Unequal  weighting  of  the  observations  can  be  carried  out  in  multiple  regres¬ 
sion  as  in  simple  regression.  Let  the  weights  vty  be  contained  in  the  diagonal 
matrix  W  of  weights: 


(7.70) 


w  i 


W  = 

nXri 


W2 


0 


0 


The  weighted  least  squares  estimators  of  the  regression  coefficients  then  are: 

(7.71)  b  =  (X'WX)-1X'WY 

pX  1 

When  the  error  term  variances  of  are  not  equal,  the  weights  vty  are  chosen  to  be 
inversely  proportional  to  of,  so  that  of  =  o-2/vty.  The  estimated  variance-covari¬ 
ance  matrix  of  the  regression  coefficients  then  is: 

(7.72)  s2(b)  =  MSEW(X'  WX)“! 

pxp 


where  MSEW  is  based  on  the  weighted  squared  deviations: 

XMy(U  -  Yd2 


(7.72a) 


MSEU 


n 


P 


As  for  simple  regression,  the  appropriate  weights  vty  may  often  be  found  from 
basic  relationships.  For  instance,  it  may  be  found  that  the  error  term  variances  of 
are  proportional  to  Xjk,  the  level  of  the  kth  independent  variable  squared.  Hence, 
of  =  (r2Xjk  then  and  the  weights  wy  =  1  /Xjk  would  be  used. 


PROBLEMS 

I 

7.1.  Refer  to  Figure  7.4a.  By  how  much  approximately  does  mean  yield  increase  when 
rainfall  increases  from  9  to  1 1  inches  and  temperature  is  held  constant?  Could  you 
have  answered  this  question  if  rainfall  and  temperature  interact  in  their  effects  on 
crop  yield? 

7.2.  Consider  the  response  function  E(Y)  —  25  +  32C  +  4X2  +  \.SXiX2. 

a.  Plot  the  response  function  against  X\  when  X2  =  3  and  when  X2  =  6.  How  is 
the  interaction  effect  of  X{  and  X2  on  Y  apparent  from  this  graph? 


264  /  Multiple  regression — I 


b.  Sketch  a  set  of  contour  curves  for  the  response  surface.  How  is  the  interaction 
effect  of  X1  and  X2  on  Y  apparent  from  this  graph? 

7.3.  Consider  the  response  function  E{Y)  =  14  +  IXi  —  5X2. 

a.  Plot  the  response  function  against  X2  when  X]  =  1  and  when  X\  —  4.  How 
does  the  graph  indicate  that  the  effects  of  X\  and  X2  on  Y  are  additive? 

b.  Sketch  a  set  of  contour  curves  for  the  response  surface.  How  does  the  graph 
indicate  that  the  effects  of  Xx  and  X2  on  Y  are  additive? 

7.4.  Set  up  the  X  matrix  and  P  vector  for  each  of  the  following  models  (assume 

i=  1 . 4): 

a.  Yt  =  /30  +  jSiXn  +  f32XnXi2  +  ef 

b.  log  Yt  =  j30  +  PiXn  +  p2Xl2  +  £l 

7.5.  Set  up  the  X  matrix  and  P  vector  for  each  of  the  following  models  (assume 
i=  1,...,5): 

a.  Yt  =  frXn  +  P2Xi2  +  jS^  +  st 

b.  VyT=  A)  +  fi\Xn  +  A  logio  xi2  +  Si 

7.6.  A  student  stated:  “Adding  independent  variables  to  a  regression  model  can  never 
reduce  R2,  so  we  should  include  all  available  independent  variables  in  the 
model.”  Comment. 

7.7.  Why  is  it  not  meaningful  to  attach  a  sign  to  the  coefficient  of  multiple  correlation 
R,  although  we  do  so  for  the  coefficient  of  simple  correlation  r? 

7.8.  Brand  preference.  In  a  small-scale  study  of  the  relation  between  degree  of 
brand  liking  (Y)  and  moisture  content  QC)  and  sweetness  (X2)  of  the  product,  the 
following  results  were  obtained  (data  are  coded): 

i:  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16 

Xn:  444466668888  10  10  10  10 

Xa:  242424242424242  4 

Yf.  64  73  61  76  72  80  71  83  83  89  86  93  88  95  94  100 

Assume  that  regression  model  (7.1)  with  independent  normal  error  terms  is  appro¬ 
priate. 

a.  Find  the  estimated  regression  coefficients.  State  the  estimated  regression 
function.  How  is  b\  interpreted  here? 

b.  Test  whether  there  is  a  regression  relation  using  a  level  of  significance  of  .01 . 
State  the  alternatives,  decision  rule,  and  conclusion.  What  does  your  test 
imply  about  /3,  and  /32? 

c.  What  is  the  P-value  of  the  test  in  part  (b)? 

d.  Estimate  Pi  and  fi2  jointly  by  the  Bonferroni  procedure  using  a  99  percent 
family  confidence  coefficient.  Interpret  your  results. 

7.9.  Refer  to  Brand  preference  Problem  7.8. 

a.  Calculate  the  coefficient  of  multiple  determination  R2.  How  is  it  interpreted 
here? 

b.  Calculate  the  coefficient  of  simple  determination  r2  between  Y,  and  Y,  .  Does  it 
equal  R2! 

7.10.  Refer  to  Brand  preference  Problem  7.8. 

a.  Obtain  an  interval  estimate  of  E(Yh)  when  XM  =  5  and  Xh2  —  4.  Use  a  99 
percent  confidence  coefficient.  Interpret  your  interval  estimate. 


Problems  /  265 


b.  Obtain  a  prediction  interval  for  a  new  observation  Yh{ new)  when  Xhl  —  5  and 
Xh2  =  4.  Use  a  99  percent  confidence  coefficient. 

7.11.  Refer  to  Brand  preference  Problem  7.8. 

a.  Obtain  the  residuals. 

b.  Plot  the  residuals  against  Y,  X),  and  X2  on  separate  graphs.  Also  prepare  a 
normal  probability  plot.  Analyze  the  plots  and  summarize  your  findings. 

c.  Conduct  a  formal  test  for  lack  of  fit  of  the  first-order  regression  function;  use 
a  —  .01.  State  the  alternatives,  decision  rule,  and  conclusion. 

7.12.  Chemical  shipment.  The  observations  to  follow,  taken  on  20  incoming  ship¬ 
ments  of  chemicals  in  drums  arriving  at  a  warehouse,  show  number  of  drums  in 
shipment  (Xf),  total  weight  of  shipment  (X2,  in  hundred  pounds),  and  number  of 
minutes  required  to  handle  shipment  (Y). 


i: 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Xn: 

7 

18 

5 

14 

11 

5 

23 

9 

16 

5 

Xi2: 

5.11 

16.72 

3.20 

7.03 

10.98 

4.04 

22.07 

7.03 

10.62 

4.76 

Yi- 

58 

152 

41 

93 

101 

38 

203 

78 

117 

44 

i : 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

Xn: 

17 

12 

6 

12 

8 

15 

17 

21 

6 

11 

Xa: 

11.02 

9.51 

3.79 

6.45 

4.60 

13.86 

13.03 

15.21 

3.64 

9.57 

Yi : 

121 

112 

50 

82 

48 

127 

140 

155 

39 

90 

Assume  that  regression  model  (7.1)  with  independent  normal  error  terms  is  appro¬ 
priate. 

a.  Obtain  the  estimated  regression  function.  How  is  b\  here  interpreted?  How  is 
b2  here  interpreted? 

b.  Test  whether  there  is  a  regression  relation,  using  a  level  of  significance  of 
.05.  State  the  alternatives,  decision  rule,  and  conclusion.  What  does  your  test 
result  imply  about  and  /32?  What  is  the  P-value  of  the  test? 

c.  Estimate  Pi  and  p2  jointly  by  the  Bonferroni  procedure  using  a  95  percent 
family  confidence  coefficient.  Interpret  your  results. 

d.  Calculate  the  coefficient  of  multiple  determination  R 2 .  How  is  this  measure 
interpreted  here? 

7.13.  Refer  to  Chemical  shipment  Problem  7.12. 

a.  Management  desires  simultaneous  interval  estimates  of  the  mean  handling 
times  for  five  typical  shipments  specified  to  be  as  follows: 

1  2  3  4  5 

Xi.  5  6  10  14  20 

X2:  3.20  4.80  7.00  10.00  18.00 

Obtain  the  family  of  estimates  using  a  95  percent  family  confidence  coeffi¬ 
cient.  Employ  the  Working-Hotelling  type  bounds  or  the  Bonferroni  proce¬ 
dure,  whichever  is  more  efficient. 

b.  For  the  observations  in  Problem  7. 12,  would  you  consider  a  shipment  of  20 
drums  with  a  weight  of  5  hundred  pounds  to  be  within  the  scope  of  the 
model?  What  about  a  shipment  of  20  drums  with  a  weight  of  19  hundred 
pounds?  Support  your  answer  by  preparing  a  relevant  plot. 

7.14.  Refer  to  Chemical  shipment  Problem  7.12.  Four  separate  shipments  with  the 


266  /  Multiple  regression — I 


following  characteristics  will  arrive  in  the  next  day  or  two: 

12  3  4 

Xi.  9  12  15  18 

X2:  7.20  9.00  12.50  16.50 

Management  desires  predictions  of  the  handling  times  for  these  shipments  so  that 
the  actual  handling  times  can  be  compared  with  the  predicted  times  to  determine 
whether  any  are  “out  of  line.”  Develop  the  needed  predictions  using  the  most 
efficient  approach  and  a  family  confidence  coefficient  of  95  percent. 

7.15.  Refer  to  Chemical  shipment  Problem  7.12. 

a.  Obtain  the  residuals  and  plot  them  against  Y,  X\,  and  X2  on  separate  graphs. 
Also  prepare  a  normal  probability  plot.  Analyze  the  plots  and  summarize 
your  findings. 

b.  Can  you  conduct  a  formal  test  for  lack  of  fit  here? 

7.16.  Refer  to  Chemical  shipment  Problem  7.12.  Three  new  shipments  are  to  be  re¬ 
ceived,  each  with  Xhl  =  7  and  Xya  —  6. 

a.  Obtain  a  95  percent  prediction  interval  for  the  mean  handling  time  for  these 
shipments. 

b.  Convert  the  interval  obtained  in  part  (a)  into  a  95  percent  prediction  interval 
for  the  total  handling  time  for  the  three  shipments. 

7.17.  Patient  satisfaction.  A  hospital  administrator  wished  to  study  the  relation  be¬ 
tween  patient  satisfaction  (7)  and  patient’s  age  (Xl5  in  years),  severity  of  illness 
(X2,  an  index),  and  anxiety  level  (X3,  an  index).  She  randomly  selected  23  pa¬ 
tients  and  collected  the  data  presented  below,  where  larger  values  of  Y,  X2,  and  X3 
are,  respectively,  associated  with  more  satisfaction,  increased  severity  of  illness, 
and  more  anxiety. 


i : 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Xn: 

50 

36 

40 

41 

28 

49 

42 

45 

52 

29 

29 

43 

Xi2: 

51 

46 

48 

44 

43 

54 

50 

48 

62 

50 

48 

53 

Xi3 : 

2.3 

2.3 

2.2 

1.8 

1.8 

2.9 

2.2 

2.4 

2.9 

2.1 

2.4 

2.4 

7: 

48 

57 

66 

70 

89 

36 

46 

54 

26 

77 

89 

67 

i: 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

Xtl\ 

38 

34 

53 

36 

33 

29 

33 

55 

29 

44 

43 

Xn- 

55 

51 

54 

49 

56 

46 

49 

51 

52 

58 

50 

Xa: 

2.2 

2.3 

2.2 

2.0 

2.5 

1.9 

2.1 

2.4 

2.3 

2.9 

2.3 

Y,: 

47 

51 

57 

66 

79 

88 

60 

49 

77 

52 

60 

Assume  that  regression  model  (7.5)  for  three  independent  variables  with  inde¬ 
pendent  normal  error  terms  is  appropriate. 

a.  Obtain  the  estimated  regression  function. 

b.  Test  whether  there  is  a  regression  relation;  use  a  .  10  level  of  significance. 
State  the  alternatives,  decision  rule,  and  conclusion.  What  does  your  test 
imply  about  /3j,  /32,  and  /33?  What  is  the  P-value  of  the  test? 

c.  Obtain  joint  interval  estimates  of  /3T,  /32,  and  /33  using  a  90  percent  family 
confidence  coefficient.  Interpret  your  results. 

d.  Calculate  the  coefficient  of  multiple  correlation.  What  does  it  indicate  here? 


Problems  /  267 


7.18.  Refer  to  Patient  satisfaction  Problem  7.17. 

a.  Obtain  an  interval  estimate  of  the  mean  satisfaction  when  Xhl  =  35,  Xh2  — 
45,  and  Xh3  =  2.2.  Use  a  90  percent  confidence  coefficient.  Interpret  your 
confidence  interval. 

b.  Obtain  a  prediction  interval  for  a  new  patient’s  satisfaction  when  Xkl  =  35, 
Xh2  =  45,  and  Xh3  =  2.2.  Use  a  90  percent  confidence  coefficient.  Interpret 
your  prediction  interval. 

7.19.  Refer  to  Patient  satisfaction  Problem  7.17. 

a.  Obtain  the  residuals  and  plot  them  against  Y  and  each  of  the  independent 
variables  on  separate  graphs.  Also  prepare  a  normal  probability  plot.  Analyze 
your  plots  and  summarize  your  findings. 

b.  Can  you  conduct  a  formal  test  for  lack  of  fit  here? 

7.20.  Mathematicians  salaries.  A  researcher  in  a  scientific  foundation  wished  to 
evaluate  the  relation  between  intermediate  and  senior  level  annual  salaries  of 
research  mathematicians  (F,  in  thousand  dollars)  and  an  index  of  publication  qual¬ 
ity  (X|),  number  of  years  of  experience  (X2),  and  an  index  of  success  in  obtaining 
grant  support  (X3).  The  data  for  a  sample  of  24  intermediate  and  senior  level 
research  mathematicians  follow. 


i: 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Xn: 

3.5 

5.3 

5.1 

5.8 

4.2 

6.0 

6.8 

5.5 

3.1 

7.2 

4.5 

4.9 

X,2: 

9 

20 

18 

33 

31 

13 

25 

30 

5 

47 

25 

11 

Xa: 

6.1 

6.4 

7.4 

6.7 

7.5 

5.9 

6.0 

4.0 

5.8 

8.3 

5.0 

6.4 

Y{. 

33.2 

40.3 

38.7 

46.8 

41.4 

37.5 

39.0 

40.7 

30.1 

52.9 

38.2 

31.8 

i: 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

Xn- 

8.0 

6.5 

6.6 

3.7 

6.2 

7.0 

4.0 

4.5 

5.9 

5.6 

4.8 

3.9 

Xa'- 

23 

35 

39 

21 

7 

40 

35 

23 

33 

27 

34 

15 

Xa: 

7.6 

7.0 

5.0 

4.4 

5.5 

7.0 

6.0 

3.5 

4.9 

4.3 

8.0 

5.0 

43.3 

44.1 

42.8 

33.6 

34.2 

48.0 

38.0 

35.9 

40.4 

36.8 

45.2 

35.1 

Assume  that  regression  model  (7.5)  for  three  independent  variables  with  inde¬ 
pendent  normal  error  terms  is  appropriate. 

a.  Obtain  the  estimated  regression  function. 

b.  Test  whether  there  is  a  regression  relation;  use  a  —  .05.  State  the  alternatives, 
decision  rule,  and  conclusion.  What  does  your  test  imply  about  /3i,  /32,  and 
/3 3?  What  is  the  P- value  of  the  test? 

c.  Estimate  /3l5  /32,  and  f33  jointly  by  the  Bonferroni  procedure  using  a  95  per¬ 
cent  family  confidence  coefficient.  Interpret  your  results. 

d.  Calculate  R2  and  interpret  this  measure. 

7.21.  Refer  to  Mathematicians  salaries  Problem  7.20.  The  researcher  wishes  to  obtain 
simultaneous  interval  estimates  of  the  mean  salary  levels  for  four  typical  research 
mathematicians  specified  as  follows: 

12  3  4 

X,  5.0  6.0  4.0  7.0 

X2  20  30  10  50 

X3  5.0  6.0  4.0  7.0 

Obtain  the  family  of  estimates  using  a  95  percent  family  confidence  coefficient. 
Employ  the  most  efficient  procedure. 


268  /  Multiple  regression — I 


7.22. 


7.23. 


7.24. 


7.25. 


EXERCISES 

7.26. 


7.27. 


Refer  to  Mathematicians  salaries  Problem  7.20.  Three  research  mathematicians 
with  the  following  characteristics  did  not  provide  any  salary  information  in  the 
study. 


1 

2 

3 

X, 

5.4 

6.2 

6.4 

x2 

17 

12 

21 

*3 

6.0 

5.8 

6.1 

Develop  separate  prediction  intervals  for  the  annual  salaries  of  these  mathemati¬ 
cians  using  a  95  percent  statement  confidence  coefficient  in  each  case.  Can  the 

salaries  of  these  three  mathematicians  be  predicted  fairly  precisely? 

Refer  to  Mathematicians  salaries  Problem  7.20. 

a.  Obtain  the  residuals  and  plot  them  against  Y  and  each  of  the  independent 
variables  on  separate  graphs.  Also  prepare  a  normal  probability  plot.  Analyze 
your  plots  and  summarize  your  findings. 

b.  Can  you  conduct  a  formal  test  for  lack  of  fit  here? 

Refer  to  Chemical  shipment  Problem  7.12. 

a.  Obtain  the  standardized  regression  coefficients. 

b.  Calculate  the  coefficient  of  determination  between  the  two  independent  varia¬ 
bles.  Is  it  meaningful  here  to  consider  the  standardized  regression  coefficients 
to  reflect  the  effect  of  one  independent  variable  when  the  other  is  held  con¬ 
stant? 

Refer  to  Patient  satisfaction  Problem  7.17. 

a.  Obtain  the  standardized  regression  coefficients. 

b.  Calculate  the  coefficients  of  determination  between  all  pairs  of  independent 
variables.  Do  these  indicate  that  it  is  meaningful  here  to  consider  the  stand¬ 
ardized  regression  coefficients  as  indicating  the  effect  of  one  independent 
variable  when  the  others  are  held  constant? 


For  each  of  the  following  models,  indicate  whether  it  is  a  general  linear  regression 
model.  If  it  is  not,  state  whether  it  can  be  expressed  in  the  form  of  (7.7)  by  a 
suitable  transformation: 

a-  Yi  —  fio  +  P\Xn  +  fi2  logio  Xa  +  fiiXfi  +  £,■ 

b.  T  =  £j  exp(/30  +  PiXn  +  PzXh) 

c.  Yj  =  /30  +  logMXn)  +  p2Xl2  +  ef 

d.  Yt  =  A)  expCM-D  +  et 

e.  Yi  =  [1  +  exp(/30  +  fi\Xa  +  £,)]  1 

(Calculus  needed.)  Consider  the  multiple  regression  model: 

Yi  =  jSiX/i  +  (32Xa  +  £,  i  = 

where  the  et  are  uncorrelated,  with  E(Si)  -  0  and  <t2(£,;)  =  a2. 
a.  Derive  the  least  squares  estimators  of  /3j  and  /32. 


Projects  /  269 


b.  Assuming  that  the  et  are  independent  normal  random  variables,  state  the 
likelihood  function  and  obtain  the  maximum  likelihood  estimators  of  ft  and 
ft.  Are  these  the  same  as  the  least  squares  estimators? 

7.28.  (Calculus  needed.)  Consider  the  multiple  regression  model: 

Yi  =  ft  +  ftfti  +  ft^n  +  ft  A,  2  +  Si  i  =  \, ...  ,n 

where  the  e,-  are  independent  A (0,  cr2).  Derive  the  least  squares  normal  equations. 
Will  these  yield  the  same  estimators  of  the  regression  coefficients  as  the  maximum 
likelihood  estimators? 

7.29.  An  analyst  wanted  to  fit  the  regression  model  F,-  =  /30  +  ftA(1  +  ftX,-2  + 
fi-iXa  +  g,,  i  =  1 , . . . ,  n,  by  the  method  of  least  squares  when  it  is  known  that 
ft  =  4.  How  can  the  analyst  obtain  the  desired  fit  using  a  multiple  regression 
computer  program? 

7.30.  For  regression  model  (7.1),  show  that  the  coefficient  of  simple  determination  r2 
between  F,-  and  F,  equals  the  coefficient  of  multiple  determination  R2. 

7.31.  In  a  small-scale  regression  study,  the  following  data  were  obtained: 

i:  1  2  3  4  5  6 

Xn:  7  4  16  3  21  8 

Xa:  33  41  7  49  5  31 

Y-.  42  33  75  28  91  55 

Assume  that  regression  model  (7.1)  with  independent  normal  error  terms  is  appro¬ 
priate.  Using  matrix  methods,  obtain  (a)  b;  (b)  e;  (c)  SSE;  (d)  SSR;  (e)  s2(b);  (f)  Yh 
when  Xh\  —  10,  Xh2  =  30;  (g)  s2(Yh )  when  XM  —  10,  A/,2  =  30. 


PROJECTS 

7.32.  Refer  to  the  SMSA  data  set.  You  have  been  asked  to  evaluate  two  alternative 
models  for  predicting  the  number  of  active  physicians  (F)  in  an  SMSA.  Proposed 
model  I  includes  as  independent  variables  total  population  (A,),  land  area  (X2), 
and  total  personal  income  (A3).  Proposed  model  II  includes  as  independent  varia¬ 
bles  population  density  (A1?  total  population  divided  by  land  area),  percent  of 
population  in  central  cities  (A2),  and  total  personal  income  (A3). 

a.  For  each  of  the  two  proposed  models,  fit  the  first-order  regression  model 
(7.5)  with  three  independent  variables. 

b.  Calculate  R2  for  each  model.  Is  one  model  clearly  preferable  in  terms  of  this 
measure? 

c.  For  each  model,  obtain  the  residuals  and  plot  them  against  F  and  against  each 
of  the  three  independent  variables.  Also  prepare  a  normal  probability  plot  for 
each  of  the  two  fitted  models.  Analyze  your  plots  and  state  your  findings.  Is 
one  model  clearly  preferable  in  terms  of  aptness? 

7.33.  Refer  to  the  SMSA  data  set. 

a.  For  each  geographic  region,  regress  the  number  of  serious  crimes  in  an 
SMSA  (F)  against  population  density  (Ai,  total  population  divided  by  land 
area),  total  personal  income  (A2),  and  percent  high  school  graduates  (A3). 


270  /  Multiple  regression — I 


Use  the  first-order  regression  model  (7.5)  with  three  independent  variables. 
State  the  estimated  regression  functions. 

b.  Are  the  estimated  regression  functions  similar  for  the  four  regions?  Discuss. 

c.  Calculate  MSE  and  R 2  for  each  region.  Are  these  measures  similar  for  the 
four  regions?  Discuss. 

7.34.  Refer  to  the  SENIC  data  set.  Two  models  have  been  proposed  for  predicting  the 

average  length  of  patient  stay  in  a  hospital  (F).  Model  I  utilizes  as  independent 

variables  age  (Xf),  infection  risk  (X2),  and  available  facilities  and  services  (X3). 

Model  II  uses  as  independent  variables  number  of  beds  (Xi),  infection  risk  (X2), 

and  available  facilities  and  services  (X3). 

a.  For  each  of  the  two  proposed  models,  fit  the  first-order  regression  model 
(7.5)  with  three  independent  variables. 

b.  Calculate  R2  for  each  model.  Is  one  model  clearly  preferable  in  terms  of  this 
measure? 

c.  For  each  model,  obtain  the  residuals  and  plot  them  against  F  and  against  each 
of  the  three  independent  variables.  Also  prepare  a  normal  probability  plot  for 
each  of  the  two  fitted  models.  Analyze  your  plots  and  state  your  findings.  Is 
one  model  clearly  preferable  in  terms  of  aptness? 

7.35.  Refer  to  the  SENIC  data  set. 

a.  For  each  geographic  region,  regress  infection  risk  (F)  against  the  independent 
variables  age  (Xi),  routine  culturing  ratio  (X2),  average  daily  census  (X3),  and 
available  facilities  and  services  (X4).  Use  the  first-order  regression  model 
(7.5)  with  four  independent  variables.  State  the  estimated  regression  func¬ 
tions. 

b.  Are  the  estimated  regression  functions  similar  for  the  four  regions?  Discuss. 

c.  Calculate  MSE  and  R2  for  each  region.  Are  these  measures  similar  for  the 
four  regions?  Discuss. 


CITED  REFERENCE 

7.1  SAS  User’s  Guide.  1979  ed.  Raleigh,  N.C.:  SAS  Institute,  1979. 


Multiple  regression — II 


In  this  chapter,  we  continue  our  discussion  of  multiple  regression  by  first 
considering  multicollinearity  and  its  effects  in  multiple  regression  models.  Then 
we  take  up  several  other  topics  in  multiple  regression,  including  additional  tests 
of  hypotheses  concerning  the  regression  coefficients. 

8.1  MULTICOLLINEARITY  AND  ITS  EFFECTS 

In  multiple  regression  analysis,  one  is  often  concerned  with  the  nature  and 
significance  of  the  relations  between  the  independent  variables  and  the  dependent 
variable.  Questions  that  are  frequently  asked  include: 

1.  What  is  the  relative  importance  of  the  effects  of  the  different  independent 
variables? 

2.  What  is  the  magnitude  of  the  effect  of  a  given  independent  variable  on  the 
dependent  variable? 

3.  Can  any  independent  variable  be  dropped  from  the  model  because  it  has 
little  or  no  effect  on  the  dependent  variable? 

4.  Should  any  independent  variables  not  yet  included  in  the  model  be  consid¬ 
ered  for  possible  inclusion? 

If  the  independent  variables  included  in  the  model  are  (1)  uncorrelated  among 
themselves  and  (2)  uncorrelated  with  any  other  independent  variables  that  are 

271 


272  /  Multiple  regression — II 


related  to  the  dependent  variable  but  omitted  from  the  model,  relatively  simple 
answers  can  be  given  to  these  questions.  Unfortunately,  in  many  nonexperimen- 
tal  situations  in  business,  economics,  and  the  social  and  biological  sciences,  the 
independent  variables  tend  to  be  correlated  among  themselves  and  with  other 
variables  that  are  related  to  the  dependent  variable  but  are  not  included  in  the 
model. 

When  the  independent  variables  are  correlated  among  themselves,  inter - 
correlation  or  multicollinearity  among  them  is  said  to  exist.  (Sometimes  the 
latter  term  is  reserved  for  those  instances  when  the  correlation  among  independ¬ 
ent  variables  is  very  high  or  even  perfect.)  We  shall  explore  now  a  variety  of 
interrelated  problems  that  are  created  by  multicollinearity  among  the  independent 
variables.  First,  however,  we  examine  the  situation  when  the  independent  varia¬ 
bles  are  not  correlated. 

Example  of  uncorrelated  independent  variables 

Table  8 . 1  contains  data  for  a  small-scale  experiment  on  the  effect  of  work 
crew  size  (XU  and  level  of  bonus  pay  (X2)  on  crew  productivity  score  (F).  It  is 
easy  to  show  that  Xx  and  X2  are  uncorrelated  here,  i.e.,  r\2  =  0,  where  r\2 
denotes  the  coefficient  of  simple  determination  between  Xi  and  X2.  Table  8.2a 
contains  the  fitted  regression  function  and  analysis  of  variance  table  when  both 
Xi  and  X2  are  included  in  the  model.  Table  8.2b  contains  the  same  information 
when  only  Xi  is  included  in  the  model,  and  Table  8.2c  contains  this  information 
when  only  X2  is  in  the  model.  In  Table  8.2a,  we  use  the  notation  SSR(X i ,  X2)  and 
SSE(Xl,  X2)  to  indicate  explicitly  the  two  independent  variables  in  the  model. 
Similarly,  in  Table  8.2b,  we  use  the  notation  SSR(X i)  and  SSE(X i)  to  show  that 
only  independent  variable  Xj  is  in  the  model,  so  that  the  regression  here  is  a 
simple  regression.  We  do  likewise  in  Table  8.2c. 

TABLE  8.1  Work  crew  productivity  example  with  uncorrelated  independent 
variables 


Trial 

i 

Crew  Size 
Xn 

Bonus  Pay 

X,2 

Crew  Productivity  Score 
Y, 

1 

4 

$2 

42 

2 

4 

2 

39 

3 

4 

3 

48 

4 

4 

3 

51 

5 

6 

2 

49 

6 

6 

2 

53 

7 

6 

3 

61 

8 

6 

3 

60 

An  important  feature  to  note  in  Table  8.2  is  that  the  regression  coefficients  for 
X,  andX2  are  the  same,  whether  only  the  given  independent  variable  is  included 


8.1  Multicollinearity  and  its  effects  /  273 


TABLE  8.2  ANOVA  tables  for  work  crew  productivity  example  with  uncorrelated 
independent  variables 


(a)  Regression  of  Y  on  Xi  and  X2 
f  =  .375  +  5.315X,  +  9.250Jf2 

Source  of 

Variation  SS  df  MS 


Regression 

SSR(Xl,Xi)  =  402.250 

2 

MSR(Xi,  X2)  =  201.125 

Error 

SSE{XUX2)  =  17.625 

5 

MSE(Xt,  X2)  =  3.525 

Total 

SSTO  =  419.875 

7 

(b)  Regression  of  Y  on  X2 

f=  23.500  +  5.375Xi 

Source  of 
Variation 

df 

MS 

Regression 

SSR(Xi)  =  231 .125 

1 

MSR(Xt)  =  231.125 

Error 

SSE(X1)  =  188.750 

6 

MSE(X1)=  31.458 

Total 

SSTO  =  419.875 

7 

(c)  Regression  of  Y  on  X2 

f=  27.250 

+  9.250X2 

Source  of 
Variation 

55 

df 

MS 

Regression 

SSR(X2)=  171.125 

1 

MSR(X2)  =  171 .125 

Error 

SSE(X2)  =  248.750 

6 

MSE(X2)=  41.458 

Total 

SSTO  =  419.875 

7 

in  the  model  or  both  independent  variables  are  included.  This  is  a  result  of  the 
two  independent  variables  being  uncorrelated. 

Thus,  if  the  independent  variables  are  uncorrelated,  the  effects  ascribed  to 
them  by  a  first-order  regression  model  are  the  same  no  matter  which  other  inde¬ 
pendent  variables  are  included  in  the  model.  This  is  a  strong  argument  for  con¬ 
trolled  experiments  whenever  possible,  since  experimental  control  permits  mak¬ 
ing  the  independent  variables  uncorrelated. 

Another  important  feature  of  Table  8.2  is  related  to  the  error  sums  of  squares. 
Note  from  Table  8.2a  that  the  error  sum  of  squares  when  both  X\  and  X2  are 
included  in  the  model  is  SSE(X  1 ,  X2)  =  17.625.  When  only  X]  is  included  in  the 
model,  however,  the  error  sum  of  squares  is  SSE(X  1)  =  188.750  according  to 
Table  8.2b.  Since  the  variation  in  Y  whenA^  alone  is  considered  is  188.750  but  is 
only  17.625  when  bothAi  and  X2  are  considered,  we  may  ascribe  the  difference: 

SSE(Xi)  -  SSE(XuX2)  =  188.750  -  17.625  =  171.125 


274  /  Multiple  regression — II 


to  the  effect  of  X2.  We  shall  denote  this  difference  by  SS^XzlXi): 

(8.1)  SSR(X2 \Xi)  =  SSE(X i)  -  SSEiX, ,  X2) 

When  we  fit  a  regression  function  containing  only  X2,  we  also  obtain  a  meas¬ 
ure  of  the  reduction  in  the  variation  of  Y  associated  with  X2,  namely,  SSR(X2). 
Table  8.2c  indicates  that  SSR(X2 )  =  171.125,  which  is  the  same  as  SSR (X2 |  X  ] ) 
=  171.125.  The  reason  for  this  is  that  Xi  and  X2  are  uncorrelated. 

The  story  is  the  same  for  independent  variable  Xi .  Let: 

(8 . 2)  SSR  (Xi  \X2)  =  SSE(X2)  -  SSE(Xi ,  X2) 

For  our  example,  we  have: 

=  248.750  -  17.625  =  231.125 

This  sum  of  squares  is  the  same  as  SSR(Xi)  =  231.125  from  Table  8.2b,  ob¬ 
tained  when  Y  is  regressed  only  onXj. 

In  general,  when  two  independent  variables  are  uncorrelated,  the  marginal 
contribution  of  an  independent  variable  in  reducing  the  error  sum  of  squares 
when  the  other  independent  variable  is  in  the  model  is  exactly  the  same  as  when 
this  independent  variable  is  in  the  model  alone. 

Note 


To  show  that  the  regression  coefficient  of  X,  is  unchanged  when  X2  is  added  to  the 
regression  model  in  the  case  where  Xi  and  X2  are  uncorrelated,  consider  the  algebraic 
expression  for  bi  in  the  multiple  regression  model  with  two  independent  variables: 


(8.3) 


M 

1 

><l 

1 

251 

'  S(L-F)2 

1/2 

M 

1 

to 

1 

2(Xn  -X,)2 

rY2r\2 

where  rY2  denotes  the  coefficient  of  simple  correlation  between  Y  and  X2,  and  r]2,  as 
before,  denotes  the  coefficient  of  simple  correlation  between  Xj  and  X2. 

If  Xi  and  X2  are  uncorrelated,  r12  =  0,  and  (8.3)  reduces  to: 


(8.3a) 


S(Xn  -  X|  )(T,  -  Y) 
2(Xa  -  XO2 


But  (8.3a)  is  the  estimator  of  the  slope  for  the  simple  regression  of  Y  on  Xj ,  per  (2. 10a). 

Hence,  if  Xj  and  X2  are  uncorrelated,  adding  X2  to  the  regression  model  does  not 
change  the  regression  coefficient  for  Xi;  correspondingly,  adding  X\  to  the  regression 
model  does  not  change  the  regression  coefficient  for  X2 . 


Example  of  correlated  independent  variables 

As  mentioned  earlier,  nonexperimental  data  in  many  disciplines  frequently 
consist  of  correlated  independent  variables.  For  example,  in  a  regression  of  fam¬ 
ily  food  expenditures  on  the  independent  variables  family  income,  family  sav¬ 
ings,  and  age  of  head  of  the  household,  the  independent  variables  will  be  corre¬ 
lated  among  themselves.  Further,  the  independent  variables  will  also  be 


8.1  Multicollinearity  and  its  effects  /  275 


correlated  with  other  socioeconomic  variables  not  included  in  the  model  that  do 
affect  family  food  expenditures,  such  as  family  size.  We  shall  now  consider  the 
effects  of  multicollinearity,  i.e.,  correlated  independent  variables,  upon  the  re¬ 
gression  coefficients  and  upon  the  regression  sums  of  squares. 

Effect  of  multicollinearity  on  regression  coefficients.  Table  8.3  contains 
data  for  a  study  of  the  relation  of  body  fat  (F)  to  triceps  skinfold  thickness  (Xi) 
and  thigh  circumference  (X2),  based  on  a  sample  of  20  healthy  females  25-34 
years  old.  The  triceps  skinfold  thicknesses  and  thigh  circumferences  for  these 
subjects  are  highly  correlated,  as  the  scatter  plot  in  Figure  8.1  suggests.  Indeed, 
the  coefficient  of  simple  correlation  between  these  two  independent  variables  for 
the  20  subjects  is  r12  =  +.92. 


TABLE  8.3  Body  fat  example  with  correlated  independent  variables 


Triceps 

ibject  Skinfold  Thickness 

i  Xn 

Thigh 

Circumference 

Xl2 

Body  Fat 

y, 

1 

19.5 

43.1 

11.9 

2 

24.7 

49.8 

22.8 

3 

30.7 

51.9 

18.7 

4 

29.8 

54.3 

20.1 

5 

19.1 

42.2 

12.9 

6 

25.6 

53.9 

21.7 

7 

31.4 

58.5 

27.1 

8 

27.9 

52.1 

25.4 

9 

22.1 

49.9 

21.3 

10 

25.5 

53.5 

19.3 

11 

31.1 

42.7 

12.8 

16 

29.5 

54.4 

23.9 

17 

27.7 

55.3 

22.6 

18 

30.2 

58.6 

25.4 

19 

22.7 

48.2 

14.8 

20 

25.2 

51.0 

21.1 

Table  8.4a  contains  the  results  of  a  computer  run  for  regressing  F  on  Xx  and 
X2,  and  presents  the  fitted  regression  function  and  the  analysis  of  variance  table. 
Again,  we  use  the  notation  SSR(Xl}  X2)  and  SSE(Xi,  X2)  to  indicate  explicitly 
that  both  independent  variables  are  in  the  fitted  model.  Table  8.4b  contains  the 
fitted  regression  and  the  analysis  of  variance  table  for  the  regression  of  Y  on  X\ 
only,  and  Table  8.4c  contains  the  results  for  the  regression  of  F  on  X2  only. 

Note  first  that  the  regression  coefficient  for  X\ ,  triceps  skinfold  thickness,  is 
not  the  same  in  Tables  8.4a  and  b.  Thus,  the  effect  ascribed  to  Xx  by  the  fitted 
response  function  varies  here,  depending  upon  whether  only  Xj  or  both  X!  and  X2 


276  /  Multiple  regression — II 


FIGURE  8.1  Scatter  plot  of  thigh  circumference  against  triceps 
skinfold  thickness — body  fat  example 


TABLE  8.4  ANOVA  table  for  body  fat  example  with  correlated 
independent  variables 


(a)  Regression  of  Y  on  X1  and  X2 
Y=  -19.174  +  ,2224Xj  +  .6594X, 

Source  of 


Variation 

SS 

df 

MS 

Regression 

SSR  (X  i ,  X2  ) 

=  385.44 

2 

MSR{X^,X2)  = 

192.72 

Error 

SSE(Xj ,  X2) 

=  109.95 

17 

MSE(X i ,  X2)  = 

6.47 

Total 

SSTO 

=  495.39 

19 

(b)  Regression  of  Y  on  X 

1 

Y  = 

-1.496  + 

.8572X! 

Source  of 
Variation 

SS 

df 

MS 

Regression 

SSR  (X  i ) 

=  352.27 

1 

MSR{Xx)  = 

352.27 

Error 

SSEiX,) 

=  143.12 

18 

MSE(X  i)  = 

7.95 

Total 

SSTO 

=  495.39 

19 

(c)  Regression  of  Y  on  X: 

l 

Y  = 

-23.634  + 

,8566X2 

Source  of 
Variation 

SS 

df 

MS 

Regression 

SSR(X2) 

=  381.97 

1 

MSR(X2)  = 

381.97 

Error 

SSE(X2) 

=  113.42 

18 

MSE(X2)  = 

6.30 

Total 


SSTO  =  495.39  19 


8.1  Multicollinearity  and  its  effects  /  277 


are  being  considered  in  the  model.  The  reason  for  the  different  regression  coeffi¬ 
cients  is  that  the  independent  variables  are  correlated  here,  as  we  saw  earlier 
from  Figure  8.1.  If  we  were  to  consider  as  a  third  independent  variable  X3, 
midarm  circumference,  the  regression  coefficient  for  X !  in  the  fitted  response 
function  with  three  independent  variables  would  be  different  again  from  the  two 
coefficients  in  Table  8.4  since  midarm  circumference  is  moderately  correlated 
with  triceps  skinfold  thickness. 

Next,  we  turn  to  the  regression  coefficient  for  X2 ,  thigh  circumference.  We 
see  again  from  Table  8.4  that  the  regression  coefficient  for  X2  when  X\  is  also  in 
the  model  is  different  from  the  regression  coefficient  when  X2  is  the  only  inde¬ 
pendent  variable  in  the  model. 

The  important  conclusion  we  must  draw  is:  When  independent  variables  are 
correlated,  the  regression  coefficient  of  any  independent  variable  depends  on 
which  other  independent  variables  are  included  in  the  model  and  which  ones  are 
left  out.  Thus,  a  regression  coefficient  does  not  reflect  any  inherent  effect  of  the 
particular  independent  variable  on  the  dependent  variable  but  only  a  marginal  or 
partial  effect,  given  whatever  other  correlated  independent  variables  are  included 
in  the  model. 

Effect  of  multicollinearity  on  regression  sums  of  squares.  Note  from 
Table  8.4a  that  the  error  sum  of  squares  when  bothX[  andX2  are  included  in  the 
model  is  SSE(Xi,  X2)  =  109.95.  When  only  X2  is  included  in  the  model,  the 
error  sum  of  squares  is  SSE(X2 )  =  113.42  as  seen  from  Table  8.4c.  Since  the 
variation  in  Y  when  X2  alone  is  considered  is  113.42  but  is  109.95  when  bothX] 
and X2  are  considered,  we  ascribe  the  difference  to  the  effect  of  X\ .  Using  (8.2), 
we  obtain: 

SSR(Xx \X2)  =  SSE(X2)  -  SSE(X UX2)  =  113.42  -  109.95  =  3.47 

When  we  fit  a  regression  function  containing  only  X\ ,  we  also  obtain  a  meas¬ 
ure  of  the  reduction  in  variation  of  Y  associated  with  X\ ,  namely,  SSR(X{).  For 
our  example,  Table  8.4b  indicates  that  SSRiX^)  =  352.27  which  is  not  the  same 
as  SSR(Xx  \X2)  =  3.47.  The  reason  for  the  large  difference  is  the  high  positive 
correlation  between  X\  and  X2. 

The  story  is  the  same  for  the  other  independent  variable.  Using  (8.1),  we 
obtain: 

SSR(X2 |Zj)  =  SSE(Xi)  -  SSE(XuX2)  =  143.12  -  109.95  =  33.17 

This  sum  of  squares  is  not  the  same  as  SSR(X2 )  =  381.97  from  Table  8.4c, 
obtained  when  Y  is  regressed  only  on  X2 . 

The  important  conclusion  is:  When  independent  variables  are  correlated,  there 
is  no  unique  sum  of  squares  which  can  be  ascribed  to  an  independent  variable  as 
reflecting  its  effect  in  reducing  the  total  variation  in  Y.  The  reduction  in  the  total 
variation  ascribed  to  an  independent  variable  must  be  viewed  in  the  context  of  the 
other  independent  variables  included  in  the  model,  whenever  the  independent 
variables  are  correlated. 


278  /  Multiple  regression — I! 


Let  us  consider  the  meaning  of  SSR(Xx)  and  SSRiX]  \X2)  further.  SSR(Xi) 
measures  the  reduction  in  the  variation  of  Y  when  X\  is  introduced  into  the 
regression  model  and  no  other  independent  variable  is  present.  SSR(X1\X2) 
measures  the  further  reduction  in  the  variation  of  Y  when  X2  is  already  in  the 
regression  model  andZi  is  introduced  as  a  second  independent  variable.  Hence, 
the  notation  SSRiX^  \X2)  is  used  to  show  that  the  sum  of  squares  measures  a 
reduction  in  variation  of  Y  associated  with  X\ ,  given  that  X2  is  already  included 
in  the  model. 

The  reason  why  SSR(X1\X2)  =  3.47  is  less  than  SSR(Xx )  =  352.27  in  our 
example  should  now  be  apparent.  Since  X\  is  highly  positively  correlated  with 
X2 ,  much  of  the  power  of  Xt  to  reduce  the  variation  in  Y  is  already  accounted  for 
by  SSR (X2)  when  X2  alone  is  included  in  the  model.  Hence,  the  marginal  effect 
of  X,  in  reducing  the  variation  in  Y,  given  that  X2  is  in  the  model,  is  less  than  the 
effect  if  X\  were  introduced  into  the  model  without  X2  being  present.  For  the 
same  reason,  SSR(X2\X1)  is  less  than  SSR(X2). 

The  terms  SSR(X2\X1)  and  SSR(Xx  \X2)  are  called  extra  sums  of  squares, 
since  they  indicate  the  additional  or  extra  reduction  in  the  error  sum  of  squares 
achieved  by  introducing  an  additional  independent  variable. 

Simultaneous  tests  on  regression  coefficients 

Just  as  multicollinearity  among  the  independent  variables  leads  to  regression 
coefficients  that  vary  depending  on  which  correlated  independent  variables  are 
included  in  the  model,  so  does  multicollinearity  also  cause  difficulties  in  statisti¬ 
cal  tests  of  the  regression  coefficients.  A  not  infrequent  abuse  in  the  analysis  of 
multiple  regression  models  is  to  examine  the  t*  statistic  in  (7.42b): 


for  each  regression  coefficient  in  turn  to  decide  whether  or  not  fik  =  0  for 
k=  1, . . .  ,p  —  1.  Even  if  a  simultaneous  inference  procedure  is  used,  and  often 
it  is  not,  problems  still  exist. 

Let  us  consider  the  first-order  regression  model  with  two  independent  varia¬ 
bles: 

(8.4)  Yj  =  yS0  +  fix Xn  +  fi2Xi2  +  st  Full  model 

If  the  test  on  fix  indicates  it  is  zero,  the  regression  model  (8.4)  would  be: 

Yi  =  fio  +  fiiXn  +  £i 

If  the  test  on  fi2  indicates  it  is  zero,  the  regression  model  (8.4)  would  be: 

Yi  =  fio  +  fi]Xn  +  si 

However,  if  the  separate  tests  indicate  that  fix  =  0  and  fi2  =  0,  that  does  not 
necessarily  jointly  imply  that: 

Yi  =  fio  +  ex 

since  neither  of  the  tests  considers  this  alternative. 


8.1  Multicollinearity  and  its  effects  /  279 


For  an  example,  consider  the  data  in  Table  8.5  for  10  ski  resorts  in  New 
England  during  a  period  of  normal  snow  conditions.  The  computer  output  for  the 
regression  of  visitor  days  (7)  on  miles  of  intermediate  trails  (Z^)  and  lift  capacity 
(Z2)  is  summarized  in  Table  8.6a.  The  proper  test  for  the  existence  of  a  regres¬ 
sion  relation: 


H0 :  pi  =  0  and  /32  =  0 

Ha:  not  both  pi  and  p2  equal  0 

is  the  F  test  of  (7.30).  The  test  statistic  for  our  example  is  (Table  8.6a): 

MSR  811,865,088 

F*  = - = - ! - ! - =  294 

MSE  2,757,701 

Controlling  the  level  of  significance  at  .05,  we  require  F(.95;  2,  7)  =  4.74. 
Since  F*  =  294  >  4.74,  we  conclude  Ha,  that  there  is  a  regression  relation 
between  7  and  the  independent  variables  X\  and  X2.  Hence,  at  least  one  of  the 
two  regression  coefficients  does  not  equal  zero.  The  F-value  for  the  test  is  less 
than  .001  because  F(. 999;  2,  7)  =  21.7. 

Let  us  now  examine  the  t*  statistics  at  a  5  percent  family  level  of  significance 
by  the  Bonferroni  technique.  We  require  t{. 9875;  7)  =  2.84.  Since  both  t*  statis¬ 
tics  have  absolute  values  that  do  not  exceed  2.84  (Table  8.6a),  we  would  con¬ 
clude  Pi  =  0  and  p2  =  0,  contrary  to  the  earlier  conclusion  that  not  both  coeffi¬ 
cients  equal  zero. 

To  understand  this  apparently  paradoxical  result,  let  us  investigate  the  test  of, 
say: 


H0:  p2  =  0 
Ha :  p2  0 


by  the  general  linear  test  approach.  We  first  obtain  the  error  sum  of  squares 

TABLE  8.5  Ski  resort  example  with  highly  correlated  independent 
variables 


Ski 

Resort 

i 

Miles  of 
Intermediate 
Trails 

Xn 

Lift  Capacity 
{skiers 
per  hour) 

Xtl 

Total  Visitor 
Days  during 
Sample  Period 
Y, 

1 

10.5 

2,200 

19,929 

2 

2.5 

1,000 

5,839 

3 

13.1 

3,250 

23,696 

4 

4.0 

1,475 

9,881 

5 

14.7 

3,800 

30,01 1 

6 

3.6 

1,200 

7,241 

7 

7.1 

1,900 

1 1 ,634 

8 

22.5 

5,575 

45,684 

9 

17.0 

4,200 

36,476 

10 

6.4 

1,850 

12,068 

280  /  Multiple  regression — II 


TABLE  8.6  Selected  computer  outputs  for  ski  resort  example  with  highly  correlated 
independent  variables 


Source  of 
Variation 

(a)  Regression  of  Y  on  X1  and  X2 
f=  -1,806.82  +  1,131.0326!  +4.00X2 

SS  df  MS 

Regression 

SSR(XuX2)  = 

1,623,730,176 

2  MSR(X  i,262)  = 

811,865,088 

Error 

SSE(XuX2)  = 

19,303,908 

7  MSE(X  i,262)  = 

2,757,701 

Total 

SSTO  = 

1 ,643,034,084 

9 

Estimated 

Estimated  Standard 

Variable  Regression  Coefficient 

Deviation 

t* 

X ! 

1,131.03 

j-(^i)  =  615.76 

1.837 

b2  = 

4.002 

s(b2)  =  2.71 

1.477 

(b) 

\  Regression  of  Y  on  X2 

f 

=  -363.98  +  2,032.5326! 

Source  of 

Variation 

SS 

df  MS 

Regression 

SSR(Xi)  = 

1,617,707,400 

1  MSRiXf}  =  1 ,617,707,400 

Error 

SSEiXt)  = 

25,326,684 

8  M5E(26i)  = 

3,165,836 

Total 

SSTO  = 

1,643,034,084 

9 

SSE(F )  for  the  full  model  in  (8.4),  associated  with  n  —  3  degrees  of  freedom. 
Next,  we  formulate  the  reduced  model  under  H0 : 

(8.5)  Yi  =  /30  +  (3\Xii  +  si  Reduced  model 

and  obtain  the  error  sum  of  squares  SSE(R )  for  the  reduced  model.  Associated 
with  SSE(R )  are  n  -  2  degrees  of  freedom.  The  test  statistic  (3.68)  then  leads  to: 

*  _  SSE(R)  -  SSE(F )  ^  SSE(F) 

(n  —  2)  —  (n  —  3)  n  —  3 

In  our  notation  recognizing  explicitly  the  independent  variables  in  the  model, 
we  have  from  the  two  parts  of  Table  8.6: 

SSE(F )  =  SSE(XuX2)  =  19,303,908 
SSE(R)  =  SSE(X\)  =  25,326,684 

so  that  by  (8.1): 

SSE(R)  -  SSE(F )  =  SSEiXi)  -  SSE(Xx ,  X2)  =  SSR(X2\Xx) 


8.1  Multicollinearity  and  its  effects  /  281 


Hence 

(8.6) 


p* 


SSR(X2\X])  SSE(Xx  ,  X2) 


1  n  —  3 

For  our  example  in  Table  8.6,  we  have: 

SSR(X2\X1)  =  25,326,684  -  19,303,908  =  6,022,776 

and  hence: 

6,022,776  19,303,908 


P*  = 


1 


=  2.18 


Recall  from  Table  8.6a  that  the  /*  statistic  for  testing  /32  =  0  is  t*  =  1.477. 
Since: 


(r*)2  =  (1.477)2  =  2.18  =  F* 

we  see  that  the  two  test  statistics  are  equivalent.  We  already  knew  this  holds  for 
simple  regression,  and  now  we  can  see  it  also  holds  for  multiple  regression. 

The  F*  test  statistic  (8.6)  to  test  whether  or  not  fi2  =  0  is  called  a  partial  F 
test  statistic  to  distinguish  it  from  the  F*  statistic  in  (7.30b)  for  testing  whether 
all  (Bk  =  0,  i.e.,  whether  or  not  there  is  a  regression  relation  between  Y  and  the 
set  of  independent  variables.  The  latter  test  is  called  the  overall  F  test. 

The  test  statistic  (8.6)  for  the  partial  F  test  indicates  clearly  that  a  test  on 
whether  or  not  /32  =  0  is  a  marginal  test,  given  that X x  is  in  the  model.  Similarly, 
a  test  on  whether  or  not  (3X  =  0  is  a  marginal  test,  given  thatZ2  is  in  the  model.  It 
is  apparent  now  why  the  simultaneous  tests  with  the  t*  statistics  for  the  two 
different  regression  coefficients  both  led  to  the  conclusion  that  the  regression 
coefficient  equals  zero.  The  independent  variables  Xx  and  X2  in  Table  8.5  are 
highly  positively  correlated;  indeed,  the  coefficient  of  simple  correlation  is  rX2 
=  +.99.  Hence,  if  X\  is  already  in  the  model,  adding  X2  achieves  little  more 
reduction  in  the  variation  of  Y,  and  5S/?(Z2|Xi)  is  small.  When  SSR(X2  [X])  is 
small,  the  test  for  (B2  leads  to  a  small  partial  F  test  statistic  F*  and  therefore  to 
the  conclusion  that  fi2  =  0. 

Similarly,  the  explanation  why  the  t  test  led  to  the  conclusion  that  (Bx  =  0  is 
that  Xx  is  highly  positively  correlated  with  X2,  and  X2  is  assumed  to  be  in  the 
model  when  the  t  test  for  px  is  employed. 

Thus,  despite  the  fact  that  there  is  a  clear  relation  between  the  dependent 
variable  Y  and  the  set  of  independent  variables  Xx  and  X2 ,  the  separate  t  tests 
indicated  the  respective  regression  coefficients  equal  zero  because  each  test  con¬ 
siders  only  the  marginal  contribution  of  the  independent  variable,  given  that  the 
other  is  included  in  the  model. 

Note 

We  have  just  seen  that  it  is  possible  that  a  set  of  independent  variables  is  related  to  the 
dependent  variable,  yet  all  of  the  individual  tests  on  the  regression  coefficients  will  lead  to 
the  conclusion  that  they  equal  zero  because  of  the  multicollinearity  among  the  independ- 


282  /  Multiple  regression — II 


ent  variables.  This  apparently  paradoxical  result  is  also  possible  under  special  circum¬ 
stances  when  there  is  no  multicollinearity  among  the  independent  variables .  The  special 
circumstances  are  not  likely  to  be  found  in  practice,  however. 


A  final  comment 

The  discussion  in  this  section  has  indicated  that  the  choice  of  the  particular  set 
of  independent  variables  which  are  to  be  included  in  the  model  is  highly  impor¬ 
tant  and  that  in  the  presence  of  multicollinearity,  the  interpretation  of  regression 
coefficients  and  regression  sums  of  squares  must  be  undertaken  with  caution. 
The  regression  coefficients  are  affected  not  only  by  the  other  intercorrelated 
variables  in  the  model  but  also  by  intercorrelated  variables  omitted  from  the 
model.  For  instance,  an  analyst  was  perplexed  to  find  in  a  regression  of  territory 
company  sales  on  territory  population  size,  per  capita  income,  and  some  other 
independent  variables  that  the  confidence  interval  for  the  regression  coefficient 
for  population  size  indicated  it  is  negative.  The  analyst  should  have  considered 
some  of  the  omitted  independent  variables  in  search  of  an  explanation.  A  con¬ 
sultant  noted  that  the  analyst  did  not  include  the  major  competitor’s  market  pene¬ 
tration  in  the  model.  Since  the  competitor  was  most  active  and  effective  in  terri¬ 
tories  with  large  populations,  and  thereby  kept  company  sales  down  in  these 
territories,  the  result  of  the  omission  of  this  independent  variable  from  the  model 
was  a  negative  coefficient  for  the  population  size  variable. 

In  view  of  the  importance  of  the  problems  caused  by  multicollinearity  among 
the  independent  variables,  we  take  up  this  subject  in  more  detail  in  Chapter  11. 
We  consider  there  how  to  identify  the  presence  of  multicollinearity  and  take  up 
several  measures  which  may  help  to  overcome  some  of  the  problems  caused  by 
multicollinearity. 


8.2  DECOMPOSITION  OF  SSR  INTO  EXTRA  SUMS  OF  SQUARES 
Extra  sums  of  squares 

We  defined  the  extra  sum  of  squares  SSR(Xi  \X2)  in  (8.2): 

(8.7a)  SSR(X1  \X2)  =  SSE(X2 )  -  SSEiX, ,  X2) 

Likewise,  we  defined  in  (8.1): 

(8 . 7b)  SSR (X2  \X1)  =  SSE(X1 )  -  SSE(Xl ,  X2) 

These  extra  sums  of  squares  reflect  the  reduction  in  the  error  sum  of  squares  by 
adding  an  independent  variable  to  the  model,  given  that  another  independent 
variable  is  already  in  the  model. 

Any  reduction  in  the  error  sum  of  squares,  of  course,  is  equal  to  the  same 
increase  in  the  regression  sum  of  squares  since  always: 


SSTO  =  SSR  +  SSE 


8.2  Decomposition  of  SSR  into  extra  sums  of  squares  /  283 


Hence,  an  extra  sum  of  squares  can  also  be  thought  of  as  the  increase  in  the 
regression  sum  of  squares  achieved  by  introducing  the  new  variable.  We  can 
therefore  state,  equivalently: 

(8 . 8a)  SSR (Xl  [  X2)  =  SSR  (XA ,  X2)  -  SSR  (X2) 

(8.8b)  SSR(X2 |Xj)  =  SSR(X j ,  X2)  -  SSR (XO 

We  show  in  Figure  8.2  a  schematic  representation  of  the  extra  sums  of  squares 
for  our  body  fat  example.  The  total  bar  on  the  left  represents  SSTO.  The  un¬ 
shaded  component  of  this  bar  is  SSR(X2),  and  the  combined  shaded  area  repre¬ 
sents  SSE(X2).  The  latter  area  in  turn  is  the  combination  of  the  extra  sum  of 
squares  SSR(X i  \  X2)  and  the  error  sum  of  squares  SSE(X1 ,  X2)  when  both  X\  and 
X2  are  included  in  the  model.  Similarly,  the  bar  on  the  right  in  Figure  8.2  shows 
the  decomposition  containing  Note  in  both  cases  how  the  extra  sum 

of  squares  can  be  viewed  either  as  a  reduction  in  the  error  sum  of  squares  or  as 
an  increase  in  the  regression  sum  of  squares  when  the  second  variable  is  added  to 
the  regression  model. 


FIGURE  8.2  Schematic  representation  of  extra  sums  of  squares — body  fat  example 

SSTO  =  495.39  SSTO  =  495.39 


SSR(X2)  =  381.97  / 


5S/?(Ar1lAr2)  =  3.47-*> 
SSE(X2)=  113.42 


W-  SSR(XV  x2)  =  385.44 -►/ 


SSE(XvX2)=  109.95 


fSSRiXJ  =  352.27 


-5S7?(JSr2lJt'i)  =  33.17 
'>SSE(Xl)=  143.12 


Extensions  for  three  or  more  X  variables  are  straightforward.  For  instance,  we 
define: 

(8.9)  SSR(X3  ,  X2)  =  SSE(X, ,  X2)  -  SSE (X, ,  X2,  X3) 

SSR(X3  |Zj  ,  X2)  measures  the  reduction  in  the  remaining  variation  of  Y  which  is 
achieved  by  introducing  X3  into  the  regression  model  when  Xx  and  X2  are  already 
in  the  model. 

Each  extra  sum  of  squares  involving  the  addition  of  one  independent  variable 
into  the  regression  model  has  associated  with  it  one  degree  of  freedom. 


284  /  Multiple  regression — II 


Decomposition  of  SSR 

We  can  now  obtain  a  variety  of  decompositions  for  the  regression  sum  of 
squares  SSR.  Let  us  consider  the  case  of  three  X  variables.  We  begin  with  the 
identity  (3.48a)  for  variable  X\. 

(8.10)  SSTO  =  SSR  (Xj )  +  SSE(Xi ) 

where  the  notation  now  shows  explicitly  that  X1  is  the  X  variable  in  the  model. 
Replacing  SSE(Xi)  by  its  equivalent  in  (8.7b),  we  obtain: 

(8.10a)  SSTO  =  SSR(X 0  +  SSR(X2\X1)  +  SSE (Xx ,  X2) 

Replacing  SSE(Xi,  X2)  by  its  equivalent  in  (8.9),  we  obtain: 

(8.10b)  SSTO  =  SSR (X1)  +  SSR (X2  |X,)  +  SSR (X3  \XUX2) 

+  SSE(XuX2,X3) 

Since  we  have  the  same  identity  for  multiple  regression  with  three  independent 
variables  as  in  (8.10)  for  a  single  independent  variable,  namely: 

(8.11)  SSTO  =  SSR(X1 ,  X2 ,  X3)  +  SSE(X1 ,  X2 ,  X3) 
equation  (8.10b)  therefore  reduces  to: 

(8.12)  SSR(X1 ,  X2,  X3)  =  SSR(Xi)  +  SSR(X2\X1)  +  SSR(X3  |X, ,  X2) 

Thus,  the  regression  sum  of  squares  has  been  decomposed  into  marginal  com¬ 
ponents,  each  associated  with  one  degree  of  freedom.  Of  course,  the  order  of  the 
independent  variables  is  arbitrary,  and  other  orders  can  be  developed.  For  in¬ 
stance: 

(8.13)  SSR(Xi ,  X2,  X3)  =  SSR(X3)  +  SSR^  \X3)  +  SSR(X2  |Xj ,  X3) 

Indeed,  we  can  define  extra  sums  of  squares  for  two  or  more  independent 
variables  at  a  time  and  obtain  still  other  decompositions.  For  instance,  we  define: 

(8.14)  SSR(X2,  X3  |Xj)  =  SSE(Xi)  -  SSE (Xx ,  X2,  X3) 

Thus,  SSR(X2,  X3  |Xi)  represents  the  reduction  in  the  variation  of  Y  gained  when 
X2  and  X3  are  added  to  the  model  already  containing  X! .  There  are  two  degrees  of 
freedom  associated  with  SSR(X2,  X3  |Xj),  as  may  be  seen  from  the  following 
relation  which  is  based  directly  on  the  definitions  of  the  extra  sums  of  squares: 

(8.14a)  1S5R(X2,X3|X1)  =  >S(SR(X2|X1)  +  SSR(X3  |Xx ,  X2) 

Thus,  extra  sums  of  squares  for  two  or  more  X  variables  can  be  obtained  from 
extra  sums  of  squares  where  one  X  variable  is  added  at  a  time. 

With  SSR(X2,  X3IXO  we  can  then  make  use  of  the  decomposition: 

(8.15)  SSR(X! ,  X2,  X3)  =  SSRQCi)  +  SSR(X2,  X3  |Xx) 

It  is  obvious  that  the  number  of  possible  decompositions  becomes  vast  as  the 
number  of  independent  variables  increases.  Table  8.7  contains  the  ANOVA  table 


8.2  Decomposition  of  SSR  into  extra  sums  of  squares  /  285 

for  one  possible  decomposition  for  the  case  of  three  independent  variables,  and 
Table  8.8  contains  two  possible  decompositions  for  our  earlier  body  fat  example. 


Uses  of  extra  sums  of  squares  for  tests  concerning  regression  coefficients 

One  of  the  major  uses  of  extra  sums  of  squares  is  for  conducting  tests  concern¬ 
ing  regression  coefficients  without  having  to  fit  both  the  full  and  reduced  models 

TABLE  8.7  Example  of  ANOVA  table  with  decomposition  of  SSR  for  three 
independent  variables 


Source  of 


Variation 

SS 

df 

MS 

Regression 

SSR(X lt  X2  ,  X3) 

3 

MSR(XUX2,X3) 

SSR(X,.) 

1 

MSR(X3) 

X1\xl 

SSR(X2\X1) 

1 

MSR(X2 1  Xf) 

xi\xi,x2 

SSR(X3 1  Xu  X2) 

1 

MSR(X 3\XuX2) 

Error 

SSE(XU  X2 ,  X3) 

n— 4 

MSE(Xlt  X 2 ,  X3) 

Total 

SSTO 

n— 1 

TABLE  8.8  ANOVA  tables  with  different  decompositions  of  SSR- 
body  fat  example 

— 

Source  of 
Variation 

(a) 

SS 

df 

MS 

Regression 

SSR(XUX2)  =  385.44 

2 

192.72 

X, 

SSR^X,)  =  352.27 

1 

352.27 

X2\X3 

55R(X2|X1)=  33.17 

1 

33.17 

Error 

SSE(XuX2)  =  109.95 

17 

6.47 

Total 

Source  of 

SSTO  =  495.39 

(b) 

19 

Variation 

SS 

df 

MS 

Regression 

SSR(Xl,X2)  =  385.44 

2 

192.72 

*2 

SSR(X2)  =  381.97 

1 

381.97 

X,\x2 

55R(X,|X2)=  3.47 

1 

3.47 

Error 

SSE(XuX2)  =  109.95 

17 

6.47 

Total 


SSTO  =  495.39 


19 


286  /  Multiple  regression — II 


separately.  Many  computer  packages  provide  extra  sums  of  squares  in  any  de¬ 
sired  order  when  fitting  a  regression  model.  Often,  the  decomposition  provided 
is  that  corresponding  to  the  order  in  which  the  X  variables  are  entered  in  the 
regression  fit. 

In  our  ski  resort  example,  for  instance,  the  independent  variables  were  entered 
in  the  order  X\  and  X2  and  the  computer  output  yielded  the  following  results: 

Source  of 


Variation 

SS 

df 

MS 

Regression 

1,623,730,176 

2 

811,865,088 

X, 

1,617,707,400 

1 

1,617,707,400 

x2\x1 

6,022,776 

1 

6,022,776 

Error 

19,303,908 

7 

2,757,701 

Hence,  to  test  whether  or  not  fi2  =  0,  we  need  not  actually  fit  the  reduced  model 
since  the  partial  F  test  statistic  (8.6)  can  be  calculated  immediately  from  the 
above  results: 


F *  =  SSR(X2\X1)  ^  SSE(Xj ,  X2) 

1  ‘  n-  3 

6,022,776  19,303,908 

=  — - ! - -  — ! - ! - =  2.18 

1  7 

As  we  shall  see  later,  judicious  choices  in  obtaining  extra  sums  of  squares  will 
permit  a  variety  of  tests  on  the  regression  coefficients  without  requiring  extra 
computer  runs  for  fitting  reduced  models. 


8.3  COEFFICIENTS  OF  PARTIAL  DETERMINATION 

A  coefficient  of  multiple  determination  R2,  it  will  be  recalled,  measures  the 
proportionate  reduction  in  the  variation  of  Y  achieved  by  the  introduction  of  the 
entire  set  of  X  variables  considered  in  the  model.  A  coefficient  of  partial  determi¬ 
nation,  in  contrast,  measures  the  marginal  contribution  of  one  X  variable,  when 
all  others  are  already  included  in  the  model. 


Two  independent  variables 

Let  us  consider  a  first-order  multiple  regression  model  with  two  independent 
variables,  as  given  in  (7.1): 

T,  =  (B0  +  (fXn  +  (S2Xi2  +  Si 

SSE(X2 )  measures  the  variation  in  Y  when  X2  is  included  in  the  model. 
SSE(Xi ,  X2)  measures  the  variation  in  Y  when  both  Xx  and  X2  are  included  in  the 
model.  Hence,  the  relative  marginal  reduction  in  the  variation  in  Y  associated 
with  Xi  when  X2  is  already  in  the  model  is: 


8.3  Coefficients  of  partial  determination  /  287 


SSE(X2)  -  SSE(X] ,  X2) 

SSE(X2 ) 

This  measure  is  the  coefficient  of  partial  determination  between  Y  and  Xx ,  given 
that  X2  is  in  the  model.  We  denote  this  measure  by  ry\  2: 


(8.16) 


ry  1.2 


SSE(X2)  -  SSEjX,  ,X2)  _  SSE(X1 ,  X2) 
SSE(X2 )  SSE(X2) 


Using  (8.7a),  we  can  express  the  coefficient  of  partial  determination  in  terms  of 
an  extra  sum  of  squares: 


(8.16a) 


SSR(X1  X2) 
SSE(X2 ) 


ry12  thus  measures  the  proportionate  reduction  in  the  variation  of  Y  remaining 
after  X2  is  included  in  the  model  which  is  gained  by  also  including  X,  in  the 
model. 

The  coefficient  of  partial  determination  between  Y  andX2,  given  that  Xj  is  in 
the  model,  is  defined: 


(8.17) 


SSR  (X2 1 X  f) 
SSE(Xi) 


For  our  earlier  body  fat  example,  these  two  coefficients  of  partial  determina¬ 
tion  are  (Tables  8.4  and  8.8): 


*  .-r  / 

r  y  i  2  ~ 

113.42 
,  ’  33.17 

y  143.12 


.031 

.232 


Thus,  whenXj  here  is  added  to  the  model  containing X2,  the  error  sum  of  squares 
SSE(X2 )  is  reduced  by  3.1  percent.  Correspondingly,  when  X2  is  added  to  the 
model  containing  Xif  the  error  sum  of  squares  SSE(X i)  is  reduced  by  23.2 
percent. 


General  case 


The  generalization  of  coefficients  of  partial  determination  to  three  or  more 
independent  variables  in  the  model  is  immediate.  For  instance: 


(8.18a) 

(8.18b) 

(8.18c) 


SSR(Xi  [X2,  X3) 
SSE(X2,  X3) 

■SS/g(X2|XltX3) 

SSE(XuX3) 

SSR(X3\XuX2) 
SSE(Xi  ,  X2) 


288  /  Multiple  regression — II 


Note  that  in  the  subscripts  to  r2,  the  entries  to  the  left  of  the  dot  show  in  turn 
the  variable  taken  as  the  response  and  the  variable  being  added.  The  entries  to  the 
right  of  the  dot  show  the  X  variables  already  in  the  model. 


Comments 

1 .  The  coefficients  of  partial  determination  can  take  on  values  between  0  and  1 ,  as  the 
definitions  readily  indicate. 

2.  A  coefficient  of  partial  determination  can  be  interpreted  as  a  coefficient  of  simple 
determination.  Consider  a  multiple  regression  model  with  two  independent  variables. 
Suppose  we  regress  Y  on  X2  and  obtain  the  residuals: 

Y>  -  %{X2) 

where  YfX 2)  denotes  the  fitted  values  of  Y  when  X2  is  in  the  model. 

Suppose  we  further  regress  Xj  on  X2  and  obtain  the  residuals: 

Xll-Xil(X2) 

where X,i  ( X2 )  denotes  the  fitted  values  of  Xj  in  the  regression  of  X(  onX2.  The  coefficient 
of  simple  determination  r2  between  these  two  sets  of  residuals  equals  the  coefficient  of 
partial  determination  rYl2.  Thus,  this  coefficient  measures  the  relation  between  Y  and  X] 
when  both  of  these  have  been  adjusted  for  their  linear  relationships  to  X2 . 


Coefficients  of  partial  correlation 

The  square  root  of  the  coefficient  of  partial  determination  is  called  the  coeffi¬ 
cient  of  partial  correlation.  This  coefficient  is  frequently  used  in  practice,  al¬ 
though  it  does  not  have  as  clear  a  meaning  as  the  coefficient  of  partial  determina¬ 
tion. 

For  our  body  fat  example,  we  have: 

rF1. 2  =  Vrm  =  .176 
rY2.  i  =  V^232  =  .482 

Usually  the  partial  correlation  coefficient  is  given  the  same  sign  as  that  of  the 
corresponding  regression  coefficient  in  the  fitted  regression  function.  For  our 
body  fat  example,  we  had  (Table  8.4)  bx  =  +.2224  and  b2  =  +.6594.  Since 
these  are  both  positive,  each  of  the  partial  correlation  coefficients  above  is  taken 
as  positive. 

Partial  correlation  coefficients  are  frequently  used  in  computer  routines  for 
finding  the  best  independent  variable  to  be  selected  next  for  inclusion  in  the 
regression  model.  We  shall  discuss  this  use  in  Chapter  12. 


Note 

The  coefficients  of  partial  determination  can  be  expressed  in  terms  of  simple  or  other 
partial  correlation  coefficients.  For  example: 


8.4  Testing  hypotheses  concerning  regression  coefficients  in  multiple  regression  /  289 


(8.19)  rh.i 

(8.20)  Hk2.13 

Extensions  are  straightforward. 


(rY2  r12 rY\)2 

(1  -  r}2)(l  -  rY\) 

(rY23  ~  rl2.3rY1.3)2 
(1  —  r12.3)(l  —  rYl.  3) 


8.4  TESTING  HYPOTHESES  CONCERNING  REGRESSION 
COEFFICIENTS  IN  MULTIPLE  REGRESSION 

We  have  already  discussed  how  to  conduct  two  types  of  tests  concerning  the 
regression  coefficients  in  a  multiple  regression  model.  For  completeness,  we 
summarize  these  tests  here  and  then  take  up  some  additional  types  of  tests. 


Test  whether  all  f$k  =  0 

This  is  the  overall  F  test  (7.30)  of  whether  or  not  there  is  a  regression  relation 
between  the  dependent  variable  Y  and  the  set  of  independent  variables.  The 
alternatives  are: 


Ho'- Pi  =  I 82  =  •  •  •  =  jB*  =  0 

Ha  :  not  all  fik  (k  =  1, . . .  ,p  —  1)  equal  0 


and  the  test  statistic  is: 


(8.22) 


SSR(Xlt...fXp-1)  ,  SSE(Xlt...tXp-1) 
p  —  1  n  —  p 

MSR 

MSE 


If  Hq  holds,  F*  ~  F(p  —  1,  n  —  p).  Large  values  of  F*  lead  to  conclusion  Ha. 


Test  whether  a  single  f3k  =  0 

This  is  the  partial  F  test  of  whether  a  particular  regression  coefficient  fik 
equals  zero.  The  alternatives  are: 


(8.23) 


Ho  '-  pk  =  0 
Ha  -.  j3^0 


and  the  test  statistic  is: 


(8.24)  F* 


5,SR(Xfc|X1,...,Xfc-1,Xfc+1,...,Xp_1)  .  SSE{Xx,...,Xp-x) 


n  —  p 


MSR(Xk\Xu. . .  ,Xk-lt  Xk+1, . . .  ,XP_!) 


MSE 


If  H0  holds,  F*  —  F(l,  n  —  p).  Large  values  of  F*  lead  to  conclusion  Ha .  Com- 


290  /  Multiple  regression — II 


puter  packages  which  provide  extra  sums  of  squares  permit  use  of  this  test  with¬ 
out  having  to  fit  the  reduced  model. 

An  equivalent  test  statistic  is,  as  we  have  seen,  (7.42b): 


(8.25) 


t* 


bk 
s(bk ) 


If  Hq  holds,  t*  ~  t(n  —  p ).  Large  values  of  \t*\  lead  to  conclusion  Ha. 

Since  the  two  tests  are  equivalent,  the  choice  is  usually  made  in  terms  of 
available  information  provided  by  the  computer  package  output. 


Test  whether  some  regression  coefficients  equal  zero 

Sometimes  we  wish  to  determine  whether  or  not  some  regression  coefficients 
equal  zero.  The  approach  is  that  of  a  general  linear  test,  and  no  new  problems 
arise.  We  first  illustrate  the  approach  by  an  example  and  then  state  the  general 
test  statistic. 


Example.  We  consider  the  multiple  regression  application  in  Table  8.9.  The 
data  pertain  to  14  different  computer  simulations  conducted  in  designing  the 
layout  for  a  chemical  warehouse.  The  dependent  variable  is  CPU  time  (the  oper¬ 
ating  time  required  by  the  computer’s  central  processing  unit  to  run  the  simula¬ 
tion),  the  first  independent  variable  is  number  of  trials  in  the  simulation,  and  the 


second  independent  variable  is  number  of  statements  in  the  computer  program. 


Since  it  is  hypothesized  that  the  effect  of  number  of  statements  (X2)  on  CPU  time 
(Y)  may  be  curvilinear,  the  model  being  considered  is: 

(8.26)  Y,  =  Po  +  AX,  +  AX 2  +  AX?2  +  e,  Full  model 

TABLE  8.9  Data  for  computer  simulation  example 

Simulation 

i 

Number  of 
Trials 

Xn 

Number  of 
Statements 
in  Program 

xi2 

xh 

CPU  Time 
{seconds) 

Yt 

1 

550 

458 

209,764 

445.37 

2 

600 

152 

23,104 

408.88 

3 

200 

635 

403,225 

264.61 

4 

50 

128 

16,384 

73.90 

5 

350 

100 

10,000 

246.07 

6 

300 

550 

302,500 

312.00 

7 

200 

577 

332,929 

250.36 

8 

175 

234 

54,756 

179.95 

9 

200 

500 

250,000 

243.76 

10 

200 

491 

241,081 

243.63 

11 

175 

580 

336,400 

240.72 

12 

700 

135 

18,225 

462.60 

13 

800 

162 

26,244 

531.45 

14 

650 

176 

30,976 

445.48 

8.4  Testing  hypotheses  concerning  regression  coefficients  in  multiple  regression  /  291 


It  is  desired  that  we  test  whether  the  number  of  statements  variable  can  be 
dropped  from  the  model.  Hence,  we  wish  to  choose  between  the  alternatives: 

Ho'  &  =  fr  =  0 

Ha :  not  both  (32  and  equal  zero 

We  first  fit  the  full  model  and  obtain  (results  are  shown  in  Table  8.10a): 

SSE(F)  =  SSE(Xl ,  X2,X\)  =  44. 1 
The  model  under  H0  is: 

(8.27)  Yi  =  jS0  +  PiXn  +  Ei  Reduced  model 

When  we  fit  the  reduced  model,  we  obtain  (Table  8.10b): 

SSE(R )  =  SSE(Xi)  =  17,240.3 
The  general  test  statistic  (3.68): 


*  _  SSE(R )  -  SSE(F)  ^  SSE(F ) 

(n  —  2)  —  (n  —  4)  n  —  4 

can  be  simplified  here  because: 

SSE(R )  -  SSE(F)  =  SSEiX,)  -  SSE(XUX2,X22)  =  65R(X2,  X||X,) 


Hence,  we  can  write: 


(8.28) 


^(Xz^ilXQ  _  SSE(F) 
2  n  —  4 


TABLE  8.10  Computer  results  for  computer  simulation  example 


(a)  Regression  of  Y  on  Xu  X2  ,  and  Xi 
?=  7.028  +  .595Xi  +  .323W  -  .000173X1 

Source  of 


Variation 

XS 

df 

MS 

Regression 

SSR{XU  X2,  Xl)  =  214,691.2 

3 

71,563.7 

Error 

SSE(X„  X2  ,  XI)  =  44.1 

10 

4.41 

Total 

SSTO  =  214,735.3 

13 

(b)  Regression  of  Y  on  46 

f=  122.71  +  .51146 

Source  of 
Variation 

SS  df 

MS 

Regression 

55R(Xi)=  197,495.1  1 

197,495.1 

Error 

SSE(Ji 6)=  17,240.3  12 

1,436.7 

Total 


SSTO  =  214,735.4 


13 


292  /  Multiple  regression — II 


For  our  example,  we  have  (Table  8.10): 


=  17,240.3  -  44.1  =  17,196.2 


17,196.2  44.1 

^7*  =  - - - : - 

2  10 


1,950 


Suppose  the  a  risk  is  to  be  .01.  We  require  F{. 99;  2,  10)  =  7.56.  Since  F*  — 
1,950  >  7.56,  we  conclude  Ha,  that  the  number  of  statements  variable  should 
not  be  dropped  from  the  model. 

Actually,  we  did  not  need  to  fit  the  reduced  model  separately  to  conduct  the 
test;  we  used  this  approach  only  to  explain  the  logic  of  the  test.  The  computer  run 
for  fitting  the  full  model  provided  the  following  decomposition  of  SSR: 

Component  SS 


X :  197,495 

X2\Xi  17,053 

Xl\XuX2  144 


Since,  by  (8.14a): 

SSR{X2,Xl\Xx)  =  SSR(X2\X1)  +  SSR(Xl\X1,X2) 

we  can  obtain  directly: 

55/?(X2,Xi|X!)  =  17,053  +  144  =  17,197 

The  difference  in  the  final  digit  is  due  to  rounding  effects. 

In  our  example,  the  regression  model  contained  both  X2  and  X2  terms.  Com¬ 
putational  difficulties  can  at  times  arise  in  such  cases  when  obtaining  the  least 
squares  regression  coefficients.  Chapter  9  discusses  polynomial  regression  mod¬ 
els  and  how  to  avoid  these  computational  difficulties. 


Test  statistic.  For  the  general  multiple  regression  model: 

(8  29)  Yi  =  ft,  +  faXn  +  ■  •  ■  +  Pp-^X^  +  et  Full  model 

we  wish  to  test: 


(8.30)  H°:(3q  ^+1  ° 

Ha :  not  all  of  the  jS’s  in  H0  equal  zero 

where  for  convenience,  we  arrange  the  model  so  that  the  last  p  —  q  coefficients 
are  the  ones  to  be  tested.  We  first  fit  the  full  model  and  thereby  obtain 
SSE(Xi, . . .  ,Xp-i).  Then  we  fit  the  reduced  model: 

(8.31)  Yi  =  (30  +  fi\Xn  H - 1-  Pq-iXi^-i  +  et  Reduced  model 

and  obtain  SSE(Xi, . . .  ,Xq-{).  Finally,  we  set  up  the  general  linear  test  statistic 
(3.68),  which  here  is: 


(8.32)  F 


*  = 


SSE(X1,...,Xq-1)~SSE(X1,...,Xp-1)  .  SSE(XU...,XP^) 


(n-  q)-(n-  p) 


n 


8.5  Matrix  formulation  of  general  linear  test  /  293 


or  equivalently: 


(8.32a) 


^7* 


SSR(Xq 


p  -  q 


+  MSE(F ) 


Note  that  test  statistic  (8.32a)  actually  encompasses  the  two  earlier  cases.  If 
q  —  1,  the  test  is  whether  all  regression  coefficients  equal  zero.  If  q  =  p  —  1,  the 
test  is  whether  a  single  regression  coefficient  equals  zero.  Also  note  that  test 
statistic  (8.32a)  can  be  calculated  without  having  to  fit  the  reduced  model  if  the 
computer  program  provides  the  needed  extra  sums  of  squares: 

(8.33)  SSR(Xq, . . .  ,Xq-d  =  SSR(Xq\X , , . . .  ,Xq^) 

+  ---  +  SSR(Xp-1\Xu...,Xp-2) 


Other  tests 


Other  types  of  tests  are  occasionally  required.  For  instance,  for  the  full  model 
containing  three  X  variables: 

(8.34)  Yi  =  Po  +  j3\Xn  +  p2Xri  +  p3Xi3  +  ef  Full  model 


we  might  wish  to  test: 

(8.35) 


H0  :  Pi  =  p2 
Ha.  Pi  *  Pi 


The  procedure  would  be  to  fit  the  full  model  (8.34),  and  then  the  reduced 
model: 


(8.36)  Yi  =  Po  +  pc(Xn  +  Xi2)  +  p3Xi3  +  gf  Reduced  model 

where  pc  denotes  the  common  coefficient  for  (3i  and  p2  and  Xn  +  Xi2  is  the 
corresponding  new  independent  variable.  We  then  use  the  general  F*  test  statis¬ 
tic  (3.68)  with  1  and  n  —  4  degrees  of  freedom.  Since  this  test  does  not  involve 
alternatives  where  one  or  more  regression  coefficients  equal  zero,  extra  sums  of 
squares  are  not  applicable  and  the  reduced  model  must  be  fitted  for  the  test. 


8.5  MATRIX  FORMULATION  OF  GENERAL  LINEAR  TEST 

The  general  linear  test  is  based  on  the  test  statistic  (3.68): 

SSE(R)  -  SSE(F)  SSE(F ) 

(8.37)  F*  = - — - ^  + - — 

dpR  ~  dfF  dfF 

which,  when  H0  holds,  follows  the  F  distribution  with  dfR  —  dfF  degrees  of 
freedom  for  the  numerator  and  dfF  degrees  of  freedom  for  the  denominator.  We 
now  explain  how  to  represent  this  test  statistic  in  matrix  form. 


294  /  Multiple  regression — II 


Full  model 

The  full  regression  model  with  p  —  1  predictor  variables  is  given  by  (7.18): 

(8.38)  Y  =  X(3  +  e 

The  least  squares  estimators  for  the  full  model  will  now  be  denoted  by  bF  and 
are,  as  before,  given  by  (7.21): 

(8.39)  bF  =  (X'X)_1X'Y 
Also,  the  error  sum  of  squares  is  given  by  (7.27): 

(8.40)  SSE(F)  =  (Y  -  XbF)'(Y  -  XbF)  =  Y'Y  -  b^X'Y 
which  has  associated  with  it  dfF  =  n  —  p  degrees  of  freedom. 


Statement  of  hypothesis  H0 

A  linear  test  hypothesis  H0  is  represented  in  matrix  form  as  follows: 

(8.41)  Hq\  C  (3  =  h 

.sxp  pXl  sxi 

where  C  is  a  specified  s  X  p  matrix  of  rank  s  and  h  is  a  specified  sX  1  vector. 
Example  1.  The  regression  model  contains  two  X  variables,  and  we  wish  to 


test  Hq\  pi  =  2.  Then: 

C  =  [0  1 

0] 

h 

1x3 

lxl 

and  we  have: 

H0 :  C0  =  [0 

1  0] 

& 

Pi 

Pi 

or  H0:  (B1  =  2. 


Example  2.  The  regression  model  contains  two  X  variables,  and  we  wish  to 
test  H0 :  jSi  =  jS2  =  0.  Then: 


C  = 

"o 

1 

o" 

h  = 

0 

2x3 

0 

0 

1 

2X1 

0 

and  we  have: 


H0:C(3 


0  1  0 

00 

V 

0  0  1 

01 

02 

0 

or  H0:  fa  =  0,  (32  =  0. 


8.5  Matrix  formulation  of  general  linear  test  /  295 


Example  3.  The  regression  model  contains  three  X  variables ,  and  we  wish  to 
test  H0:  fa  =  (32-  Then: 

C  =  [0  1  -1  0]  h  =  [0] 

1X4  lxl 

and  we  have: 

H0:  C(3  =  [0  1  -1  0]  ry30“|  =  [0] 

0. 

Pi 

A 

or  Hq.  fir  ~  @2  =  0. 

Reduced  model 

The  reduced  model  is: 

(8.42)  Y  =  Xp  +  e  where  C(3  =  h 

It  can  be  shown  that  the  least  squares  estimators  under  the  reduced  model,  to  be 
denoted  by  bK,  are: 

(8.43)  bR  =  bF  -  (X'X)”1C'(C(X'X)_1C')_1(CbF  -  h) 
and  the  error  sum  of  squares  is: 

(8.44)  SSE(R )  =  (Y  -  Xb*)'(Y  -  Xb*) 

which  has  associated  with  it  dfR  =  n  —  (p  —  s)  degrees  of  freedom. 

Test  statistic 

It  can  be  shown  that  the  difference  SSE(R )  —  SSE(F)  can  be  expressed  as 
follows: 

(8.45)  SSE(R)  -  SSE(F)  =  (CbF  -  h),(C(X'X)“1C')“1(CbF  -  h) 

which  has  associated  with  it  dfR  —  dfF  =  (n  —  p  +  s)  —  (n  —  p)  —  s  degrees  of 
freedom. 

Hence,  the  test  statistic  is: 

SSE(R)  -  SSE(F )  SSE(F) 

(8.46)  F*  = - — - —  - — 

s  n  —  p 

where  SSE(R )  —  SSE(F )  is  given  by  (8.45)  and  SSE(F )  is  given  by  (8.40). 

To  confirm  that  the  degrees  of  freedom  associated  with  SSE(R)  —  SSE(F)  are 
s,  consider  the  earlier  three  examples. 

1 .  In  Example  1 ,  s  =  1 .  This  is  consistent  with  the  numerator  degrees  of  free¬ 
dom  in  test  statistic  (8.24). 


296  /  Multiple  regression — II 


2.  In  Example  2,5  =  2.  This  is  consistent  with  the  numerator  degrees  of  free¬ 
dom  in  test  statistic  (8.22). 

3.  In  Example  3,5=  1.  This  is  consistent  with  the  numerator  degrees  of  free¬ 
dom  in  the  test  statistic  for  the  example  on  page  293. 


Note 

The  least  squares  estimators  b/?  under  the  reduced  model,  given  in  (8.43),  can  be 
derived  by  minimizing  the  least  squares  criterion  Q  =  (Y  —  X(3)'(Y  —  XP)  subject  to  the 
constraint  Cp  -  h  =  0,  using  Lagrangian  multipliers. 


PROBLEMS 

8.1.  A  speaker  stated  in  a  workshop  on  applied  regression  analysis:  “In  business  and 
the  social  sciences ,  some  degree  of  multicollinearity  in  survey  data  is  practically 
inevitable.”  Does  this  statement  apply  equally  to  experimental  data? 

8.2.  Refer  to  the  Zarthan  Company  example  on  page  247.  The  company’s  sales  man¬ 
ager  has  suggested  that  the  predictive  ability  of  the  model  could  be  greatly  im¬ 
proved  if  promotional  expenditures  were  added  to  the  model,  since  these  expendi¬ 
tures  are  known  to  have  a  substantial  impact  on  sales.  The  company  allocates  its 
total  promotional  budget  proportionately  to  the  target  population  in  the  districts. 
Thus,  a  district  containing  4.7  percent  of  the  total  target  population  receives  4.7 
percent  of  the  total  promotional  budget.  Evaluate  the  sales  manager’s  suggestion. 

8.3.  Refer  to  Brand  preference  Problem  7.8. 

a.  Fit  the  first-order  simple  linear  regression  model  (3.1)  for  relating  brand 
liking  (7)  to  moisture  content  (Xi).  State  the  fitted  regression  function. 

b.  Compare  the  estimated  regression  coefficient  for  moisture  content  obtained  in 
part  (a)  with  the  corresponding  coefficient  obtained  in  Problem  7.8a.  What  do 
you  find? 

c.  Does  SSR(X])  equal  SSR(X i  |  X2)  here?  If  not,  is  the  difference  substantial? 

d.  Calculate  the  coefficient  of  simple  correlation  between  X1  and  X2.  What  bear¬ 
ing  does  this  have  on  your  findings  in  parts  (b)  and  (c)? 

8.4.  Refer  to  Chemical  shipment  Problem  7.12. 

a.  Fit  the  first-order  simple  linear  regression  model  (3 . 1)  for  relating  number  of 
minutes  required  to  handle  shipment  (7)  to  total  weight  of  shipment  (X2). 
State  the  fitted  regression  function. 

b.  Compare  the  estimated  regression  coefficient  for  total  weight  of  shipment 
obtained  in  part  (a)  with  the  corresponding  coefficient  obtained  in 
Problem  7.12a.  What  do  you  find? 

c.  Does  SSR(X2 )  equal  SSR(X2  |  Xi)  here?  If  not,  is  the  difference  substantial? 

d.  Calculate  the  coefficient  of  simple  correlation  between  Xx  and  X2.  What  bear¬ 
ing  does  this  have  on  your  findings  in  parts  (b)  and  (c)? 


Problems  /  297 


8.5.  Refer  to  Patient  satisfaction  Problem  7.17. 

a.  Fit  the  first-order  linear  regression  model  (7.1)  for  relating  patient  satisfaction 
(T)  to  patient’s  age  (Xj)  and  severity  of  illness  (X2).  State  the  fitted  regression 
function. 

b.  Compare  the  estimated  regression  coefficients  for  patient’s  age  and  severity 
of  illness  obtained  in  part  (a)  with  the  corresponding  coefficients  obtained  in 
Problem  7.17a.  What  do  you  find? 

c.  Does  SSR(X ,)  equal  SSR{Xi  \  X3)  here?  Does  SSR(X2)  equal  SSR(X2  |  X3)? 

d.  Calculate  the  coefficients  of  simple  correlation  between  pairs  of  X1;  X2,  and 
X3.  What  bearing  do  these  have  on  your  findings  in  parts  (b)  and  (c)? 

8.6.  Refer  to  Chemical  shipment  Problem  7.12.  Does  SSRiX^)  +  SSR(X2  \  X,)  equal 
SSR(X2 )  +  SSR(Xi  |  X2)  here?  Must  this  always  be  the  case? 

8.7.  Refer  to  Brand  preference  Problem  7.8.  Test  whether  X2  can  be  dropped  from  the 
regression  model  given  that  X\  is  retained.  Use  the  F*  test  statistic  and  level  of 
significance  .01.  State  the  alternatives,  decision  rule,  and  conclusion. 

8.8.  Refer  to  Chemical  shipment  Problem  7.12.  Test  whether  X\  can  be  dropped  from 
the  regression  model  given  that  X2  is  retained.  Use  the  F*  test  statistic  and  a  = 
.05.  State  the  alternatives,  decision  rule,  and  conclusion. 

8.9.  Refer  to  Patient  satisfaction  Problem  7.17.  Test  whether  X3  can  be  dropped  from 
the  regression  model  given  that  X\  and  X2  are  retained.  Use  the  F*  test  statistic 
and  level  of  significance  .025.  State  the  alternatives,  decision  rule,  and  conclu¬ 
sion. 

8.10'.  Refer  to  the  work  crew  productivity  example  on  page  272. 

a.  Calculate  rYi,  rY2,  rj2,  rYY2,  ry2.  \ ,  and  R2.  Explain  what  each  coefficient 
measures  and  interpret  your  results. 

b.  Are  any  of  the  results  obtained  in  part  (a)  special  because  the  two  independent 
variables  are  uncorrelated? 

c.  Obtain  the  standardized  regression  coefficients.  How  do  you  interpret  these 
coefficients?  Do  they  have  a  special  meaning  here  because  the  independent 
variables  are  uncorrelated?  {Hint:  Obtain  rY\,  rY2  ■ ) 

8.11.  Refer  to  Brand  preference  Problem  7.8.  Calculate  rY1,  rY2,  r\2,  rYY2,  rY2A,  and 
R2.  Explain  what  each  coefficient  measures  and  interpret  your  results. 

8.12.  Refer  to  Chemical  shipment  Problem  7.12.  Calculate  rYi,  rY2,  r\2,  rYX  2,  rY21, 
and  R2.  Explain  what  each  coefficient  measures  and  interpret  your  results. 

8.13.  Refer  to  Patient  satisfaction  Problem  7.17. 

a.  Calculate  rY1,  rY12,  and  rYY23.  How  is  the  degree  of  linear  association  be¬ 
tween  Y  and  Xj  affected  as  other  independent  variables  have  already  been 
included  in  the  model? 

b.  Make  a  similar  analysis  as  in  part  (a)  for  the  degree  of  linear  association 
between  Y  andX2.  Are  your  findings  similar  to  those  in  part  (a)  for  Y  and  X,  ? 

8.14.  Refer  to  the  computer  simulation  example  on  page  290.  An  observer  states  that 
both  the  number  of  trials  variable  (X])  and  the  second-order  term  for  the  number 
of  statements  variable  (X2)  can  be  dropped  from  model  (8.26).  Conduct  the  appro¬ 
priate  test  using  a  level  of  significance  of  .01 .  State  the  alternatives,  decision  rule, 
and  conclusion. 


298  /  Multiple  regression — II 


8.15.  Refer  to  Patient  satisfaction  Problem  7.17.  Test  whether  both  X2  and  X3  can  be 
dropped  from  the  regression  model  given  thatX,  is  retained.  Use  a  =  .025.  State 
the  alternatives,  decision  rule,  and  conclusion. 

8. 16.  Refer  to  Mathematicians  salaries  Problem  7.20.  Test  whether  both  X,  and  X3  can 
be  dropped  from  the  regression  model  given  that  X2  is  retained;  use  a  =  .01 .  State 
the  alternatives,  decision  rule,  and  conclusion. 


* 


EXERCISES 

8.17.  a.  Define  each  of  the  following  extra  sums  of  squares:  (1)  SSR(X5  |  X|);  (2) 

SSR(X3,  X4  |  XU;  (3)  SSR(X4  |  X1;  X2,  X3). 
b.  For  a  multiple  regression  model  with  five  X  variables,  what  is  the  relevant 
extra  sum  of  squares  for  testing  whether  or  not  ft  =  0?  Whether  or  not 

ft  =  ft  =  0? 

8.18.  Show  that: 

a.  SSR(X] ,  X2,  X3 ,  X4)  =  SSR(Xi)  +  SSR(X2,  X3  \  X{)  +  SSR(X4  \  X, ,  X2,  X3) 

b.  SSR(X1 ,  X2,  X3 ,  X4)  =  SSR(X2,  X3)  +  SSR(Xx  |  X2,  X3) 

+  SSR(X4  I  Xl5  X2,  X3) 

8.19.  Refer  to  Brand  preference  Problem  7.8. 

a.  Regress  Y  on  X2  using  the  simple  linear  regression  model  (3.1)  and  obtain  the 
residuals . 

b.  Regress  X1  on  X2  using  the  simple  linear  regression  model  (3.1)  and  obtain 
the  residuals. 

c.  Calculate  the  coefficient  of  simple  correlation  between  the  two  sets  of  residu¬ 
als  and  show  that  it  equals  rY\.2- 

8.20.  An  undergraduate  working  for  a  campus  apparel  shop  serving  student  customers 
studied  the  relation  between  monthly  allowance  received  by  customer  (Xi),  num¬ 
ber  of  years  customer  is  in  college  (X2),  and  dollar  sales  to  customer  to  date  (Y). 
The  predictive  model  considered  was: 

Y,  =  ft  +  faXn  +  filX,!  +  P3Xji  +  8,- 

State  the  reduced  models  for  testing  whether  or  not:  (1)  j8i  =  j83  =  0,  (2)  j80  =  0, 
(3)  ft  =  5,  (4)  ft  =  10,  (5)  ft  -  ft. 

8.21.  Refer  to  Exercise  8.20.  For  each  of  the  cases,  state  the  hypothesis  H0  using  the 
matrix  formulation  (8.41). 

8.22.  The  following  regression  model  is  being  considered  in  a  water  resources  study: 

Y,  =  ft  +  ftXa  +  ftXi2  +  ftXaXa  +  ftVx^"+  8,- 

State  the  reduced  models  for  testing  whether  or  not:  (1)  ft  =  ft  =  0,  (2)  ft  =  0, 
(3)  ft  =  ft  =  5,  (4)  ft  =  7. 

8.23.  Refer  to  Exercise  8.22.  For  each  of  the  cases,  state  the  hypothesis  H0  using  the 
matrix  formulation  (8.41). 

8.24.  (Calculus  needed.)  Derive  the  least  squares  estimator  under  the  reduced  model 
(8.43),  where  CP  =  h.  [Hint:  The  Lagrangian  function  is: 

L  =  (Y  -  XP)'(Y  -  Xp)  +  X'(CP  -  h),  where  k'  =  (A1; . . . ,  Aft] 


Projects  /  299 


8.25. 


PROJECTS 

8.26. 


8.27. 


8.28. 


Derive  (8.45).  [Hint:  Show  that  SSE(R)  -  SSE(F )  =  (hF  -  b^X'XOv  -  b*) 
and  obtain  an  expression  for  hF  —  bR  from  (8.43).] 


Refer  to  the  SMSA  data  set.  For  predicting  the  number  of  active  physicians  (F)  in 
an  SMSA,  it  has  been  decided  to  include  total  population  (Xj)  and  total  personal 
income  (X2)  as  independent  variables.  The  question  now  is  whether  an  additional 
independent  variable  would  be  helpful  in  the  model,  and  if  so,  which  variable 
would  be  most  helpful.  Assume  that  a  first-order  multiple  regression  model  is 
appropriate. 

a.  For  each  of  the  following  variables,  calculate  the  coefficient  of  partial  deter¬ 
mination  given  that  X\  and  X2  are  included  in  the  model:  land  area  (X3), 
percent  of  population  65  or  older  (X4),  number  of  hospital  beds  (X5),  and  total 
serious  crimes  (X6). 

b.  On  the  basis  of  the  results  in  part  (a),  which  of  the  four  additional  independ¬ 
ent  variables  is  best?  Is  the  extra  sum  of  squares  associated  with  this  variable 
larger  than  those  for  the  other  three  variables? 

c.  Using  the  F*  test  statistic,  test  whether  or  not  the  variable  determined  to  be 
best  in  part  (b)  is  helpful  in  the  regression  model  when  Xt  and  X2  are  included 
in  the  model;  use  a  =  .01.  State  the  alternatives,  decision  rule,  and  conclu¬ 
sion.  Would  the  F*  test  statistics  for  the  other  three  potential  independent 
variables  be  as  large  as  the  one  here?  Discuss. 

Refer  to  the  SENIC  data  set.  For  predicting  the  average  length  of  stay  of  patients 
in  a  hospital  (F),  it  has  been  decided  to  include  age  (X{)  and  infection  risk  (X2)  as 
independent  variables.  The  question  now  is  whether  an  additional  independent 
variable  would  be  helpful  in  the  model,  and  if  so,  which  variable  would  be  most 
helpful.  Assume  that  a  first-order  multiple  regression  model  is  appropriate. 

a.  For  each  of  the  following  variables,  calculate  the  coefficient  of  partial  deter¬ 
mination  given  that  Xi  and  X2  are  included  in  the  model:  routine  culturing 
ratio  (X3),  average  daily  census  (X4),  number  of  nurses  (X5),  and  available 
facilities  and  services  (X6). 

b.  On  the  basis  of  the  results  in  part  (a),  which  of  the  four  additional  independ¬ 
ent  variables  is  best?  Is  the  extra  sum  of  squares  associated  with  this  variable 
larger  than  those  for  the  other  three  variables? 

c.  Using  the  F*  test  statistic,  test  whether  or  not  the  variable  determined  to  be 
best  in  part  (b)  is  helpful  in  the  regression  model  whenXi  andX2  are  included 
in  the  model;  use  a  =  .05.  State  the  alternatives,  decision  rule,  and  conclu¬ 
sion.  Would  the  F*  test  statistics  for  the  other  three  potential  independent 
variables  be  as  large  as  the  one  here?  Discuss. 

Refer  to  Exercise  7.31 .  It  is  desired  to  test  whether  or  not  /3i  =  j82.  Using  matrix 
methods,  obtain  SSE(R )  —  SSE(F )  according  to  (8.45). 


Polynomial  regression 


In  this  chapter,  we  consider  one  important  type  of  curvilinear  response  model, 
namely,  the  polynomial  regression  model.  This  is  the  most  frequently  used  curvi¬ 
linear  response  model  in  practice,  because  of  its  ease  in  handling  as  a  special  case 
of  the  general  linear  regression  model  (7.18).  First,  we  discuss  some  commonly 
used  polynomial  regression  models.  Then  we  present  two  cases  to  illustrate  some 
of  the  major  problems  encountered  with  polynomial  regression. 

9.1  POLYNOMIAL  REGRESSION  MODELS 

Polynomial  regression  models  can  contain  one,  two,  or  more  than  two  inde¬ 
pendent  variables.  Further,  the  independent  variable  can  be  present  in  various 
powers.  We  illustrate  now  some  major  possibilities. 

One  independent  variable — second  order 

The  model: 

(9.1)  Yi  =  fa  +  pixt  +  fi2xf  +  Si 

where: 


Xi 


=  Xt  ~  X 


300 


9.1  Polynomial  regression  models  /  301 


is  called  a  second-order  model  with  one  independent  variable  because  the  single 
independent  variable  appears  to  the  first  and  second  powers.  Note  that  the  inde¬ 
pendent  variable  is  expressed  as  a  deviation  around  its  mean  X,  and  that  the  ith 
observation  deviation  is  denoted  by  xt.  The  reason  for  using  deviations  around 
the  mean  in  polynomial  regression  models  is  that  X,  X 2,  and  higher-power  terms 
often  will  be  highly  correlated.  This  can  cause  serious  computational  difficulties 
when  the  X'X  matrix  is  inverted  for  estimating  the  regression  coefficients.  Ex¬ 
pressing  the  independent  variable  as  a  deviation  from  its  mean  reduces  the  multi- 
collinearity  substantially  and  tends  to  avoid  computational  difficulties. 

The  regression  coefficients  in  polynomial  regression  are  frequently  written  in 
a  slightly  different  fashion,  to  reflect  the  pattern  of  the  exponents: 

(9.1a)  Yt  =  p0  +  foxi  +  (3nxf  +  e« 

We  shall  employ  this  latter  notation  in  this  chapter. 

The  response  function  for  model  (9.1a)  is: 

(9.2)  E(Y)  =  /30  +  fax  +  (5nx2 

which  is  a  parabola  and  is  frequently  called  a  quadratic  response  function.  Figure 
9.1  contains  two  examples  of  second-order  polynomial  response  functions. 

The  regression  coefficient  /3o  represents  the  mean  response  of  Y  when  x  =  0, 
i.e.,  when  X  =  X.  The  regression  coefficient  fa  is  often  called  the  linear  effect 
coefficient  while  (3n  is  called  the  quadratic  effect  coefficient. 

FIGURE  9.1  Examples  of  second-order  polynomial  response  functions 

(a)  (b) 


Uses  of  second-order  model.  The  second-order  polynomial  response  func¬ 
tion  (9.2)  has  two  basic  types  of  uses: 

1.  When  the  true  response  function  is  indeed  a  second-degree  polynomial, 
containing  additive  linear  and  quadratic  effect  components. 


302  /  Polynomial  regression 


2.  When  the  true  response  function  is  unknown  (or  complex)  but  a  second- 
order  polynomial  is  a  good  approximation  to  the  true  function. 

The  second  type  of  use  is  the  more  common  one,  but  it  entails  a  special 
danger,  that  of  extrapolation.  Consider  again  Figure  9.1a.  This  response  function 
may  fit  the  data  at  hand  very  well.  If,  however,  information  about  E(Y)  is  sought 
for  a  larger  value  of  x,  extrapolation  of  this  response  function  leads  to  the  result 
shown  in  Figure  9.2,  namely,  a  turning  down  of  the  response  function,  which 
may  not  be  in  accord  with  reality.  Polynomial  regressions  of  all  types,  especially 
those  of  higher  order,  share  this  danger  of  extrapolation.  They  may  provide  good 
fits  for  the  data  at  hand,  but  may  turn  in  unexpected  directions  when  extrapolated 
beyond  the  range  of  the  data. 

FIGURE  9.2  Extrapolation  of  second-order  polynomial  response  function  in  Figure 
9.1a 


One  independent  variable — third  order 

The  model: 

(9.3)  Y{  =  po  +  /3  &  +  farxj  +  fan*l  +  et 
where: 

xt  =  Xt-X 

is  a  third-order  model  with  one  independent  variable.  The  response  function  for 
model  (9.3)  is: 

(9.4)  E(Y)  =  fa  +  fax  +  fa\x2  +  fanx3 

Figure  9.3  contains  two  examples  of  third-order  polynomial  response  functions. 


9.1  Polynomial  regression  models  /  303 


FIGURE  9.3  Examples  of  third-order  polynomial  response  functions 

(a)  (b) 


One  independent  variable — higher  orders 

Polynomial  models  with  the  independent  variable  present  in  higher  powers 
than  the  third  are  not  often  employed.  The  interpretation  of  the  coefficients 
becomes  difficult  for  such  models,  and  they  may  be  highly  erratic  for  even  small 
extrapolations.  It  must  be  recognized  in  this  connection  that  a  polynomial  model 
of  sufficiently  high  order  can  always  be  found  to  fit  the  data  perfectly.  For 
instance,  the  fitted  polynomial  regression  function  for  one  independent  variable 
of  order  n  —  1  will  pass  through  all  n  observed  Y  values.  One  needs  to  be  wary 
therefore  of  using  high  order  polynomials  for  the  sole  purpose  of  obtaining  a 
good  fit.  Such  regression  functions  may  not  show  clearly  the  basic  elements  of 
the  regression  relation  between  X  and  Y  and  may  lead  to  erratic  extrapolations. 

Two  independent  variables — second  order 

The  model: 

(9.5)  Yt  =  /30  +  /3ix;1  +  ($2xi2  +  Pn*n  +  Pnxh  +  fi\iXi\Xi2  +  et 
where: 

xn  =  Xn  —  Xt 
X/2  =  Xi2  ~~  X2 

is  a  second-order  model  with  two  independent  variables.  The  response  surface  is: 

(9.6)  E(Y)  =  /30  +  +  /32x2  +  f3nxi  +  fB22x2  +  /3i2v1x2 

which  is  the  equation  of  a  conic  section.  Note  that  model  (9.5)  contains  separate 


304  /  Polynomial  regression 


linear  and  quadratic  components  for  each  of  the  two  independent  variables  and  a 
cross-product  term.  The  latter  represents  the  interaction  effects  between  xx  and 
x2,  as  we  noted  in  Chapter  7.  The  coefficient  /312  is  often  called  the  interaction 
effect  coefficient. 

The  second-order  response  surface  for  two  independent  variables  in  (9.6) 
represents  the  two  basic  types  of  surfaces  illustrated  in  Figure  7.3.  Stationary 
and  rising  ridges  constitute  limiting  cases  of  these  two  basic  types  of  response 
surfaces. 

Usually,  it  is  easiest  to  portray  the  second-order  response  surface  (9.6)  in 
terms  of  contour  lines.  Figure  9.4  contains  a  representation  of  the  response 
function  in  terms  of  contour  curves: 

(9.7)  E(Y)  =  1,740  -  4x1  ~  3x|  -  3x^2 

Note  that  this  response  surface  has  a  maximum  at  xi  =  0  and  x2  =  0. 

FIGURE  9.4  Example  of  a  quadratic  response  surface: 

E(Y)  =  1,740  —  4xf  —  3xi  ~  3x^2 


Polynomial  models  in  two  (or  more)  independent  variables  are  well  adapted  to 
situations  where  the  response  function  is  unknown  and  a  suitable  model  is  to  be 
developed  empirically. 

Note 

The  cross-product  term  fi]2X]X2  in  (9.6)  is  considered  to  be  a  second-order  term,  the 
same  as  jSnx?  or  ffoxi-  The  reason  can  be  seen  readily  by  writing  the  latter  terms  as 
(3nXiXi  and  ffiXoXo ,  respectively. 


9.2  Example  1 — One  independent  variable  /  305 


Three  independent  variables — second  order 

The  second-order  model  with  three  independent  variables  is: 


(9.8) 

Yi  -  Po  +  pixn  +  P2xr2 

1  +  P3X13  +  Pnxjl  +  p22xj2  +  P33x}3 

+  p12xnxi2  +  Pi3xnxi3  +  p23xl2xl3  +  £, 

where: 

xn  =  Xu 

-  X J 

xi2  =  Xi2 

-X2 

Xi3  =  Xi3 

-x3 

The  response  surface  for  this  model  is: 

(9.9)  E(Y)  =  p0  +  piXi  +  p2x2  +  p3x3  +  pnx\  +  /322^l  +  P33X3 

+  (5\2xix2  +  Pnxix3  +  p23x2x3 


The  coefficients  p12 ,  Pi3,  and  p23  are  interaction  effects  coefficients  for  interac¬ 
tions  between  pairs  of  independent  variables. 


Use  of  polynomial  regression  models 

Fitting  of  polynomial  regression  models  presents  no  new  problems  since,  as 
we  have  seen  in  Chapter  7,  they  are  special  cases  of  the  general  linear  regression 
model  (7.18).  Hence,  all  earlier  results  on  fitting  apply,  as  do  the  earlier  results 
on  making  inferences. 

When  using  a  polynomial  regression  model  as  an  approximation  to  the  true 
regression  function,  one  will  often  fit  a  second-order  or  third-order  model  and 
then  explore  whether  a  lower-order  model  is  adequate.  For  instance,  with  one 
independent  variable,  the  model: 

Yi  =  Po  +  P\Xt  +  puxf  +  pni  xf  +  St 

may  be  fitted  with  the  hope  that  the  cubic  term  and  perhaps  even  the  quadratic 
term  can  be  dropped.  Thus,  one  would  wish  to  test  whether  or  not  /3m  =  0,  or 
whether  or  not  both  =  0  and  j3m  =  0.  Similar  tests  would  often  be  con¬ 
ducted  with  polynomial  regression  models  for  two  or  more  independent  varia¬ 
bles. 

9.2  EXAMPLE  1— ONE  INDEPENDENT  VARIABLE 

We  illustrate  now  some  of  the  major  types  of  analyses  usually  conducted  with 
polynomial  regression  models  with  one  independent  variable. 

Setting 

A  staff  analyst  for  a  cafeteria  chain  wishes  to  investigate  the  relation  between 
the  number  of  self-service  coffee  dispensers  in  a  cafeteria  line  and  sales  of  cof¬ 
fee.  Fourteen  cafeterias  that  are  similar  in  such  respects  as  volume  of  business, 


306  /  Polynomial  regression 


type  of  clientele,  and  location  are  chosen  for  the  experiment.  The  number  of 
self-service  dispensers  that  are  placed  in  the  test  cafeterias  varies  from  zero 
(coffee  is  dispensed  here  by  a  line  attendant)  to  six  and  is  assigned  randomly  to 
each  cafeteria. 

Table  9.1  contains  the  results  of  the  experimental  study.  Sales  are  measured  in 
hundreds  of  gallons  of  coffee  sold. 


TABLE  9.1  Data  for  cafeteria  coffee  sales  example 


Cafeteria 

i 

Number  of 
Dispensers 

x, 

Coffee  Sales 
( hundred  gallons ) 
Y, 

i 

0 

508.1 

2 

0 

498.4 

3 

1 

568.2 

4 

1 

577.3 

5 

2 

651.7 

6 

2 

657.0 

7 

3 

713.4 

8 

3 

697.5 

9 

4 

755.3 

10 

4 

758.9 

11 

5 

787.6 

12 

5 

192  A 

13 

6 

841.4 

14 

6 

831.8 

Fitting  of  model 

The  analyst  believes  that  the  relation  between  sales  and  number  of  self-service 
dispensers  is  quadratic  in  the  range  of  observations;  sales  should  increase  as  the 
number  of  dispensers  is  greater,  but  if  the  space  is  cluttered  with  dispensers,  this 
increase  becomes  retarded.  Hence,  she  would  like  to  fit  the  quadratic  model: 

(9.10)  Y,  =  ft  +  ft*, •  +  /3„x?  +  e, 

where: 


=  Xt-X 

She  further  anticipates  that  the  error  terms  e{  will  be  fairly  normally  distributed 
with  constant  variance. 

The  Y  and  X  matrices  for  this  application  are  given  in  Table  9.2.  Note  that  the 
X  matrix  contains  a  column  of  l’s,  a  column  of  the  independent  variable  obser¬ 
vations  x  (expressed  as  deviations  around  their  mean  X  =  3),  and  a  column  of  the 
x 2  values.  From  this  point  on,  the  calculations  are  routine.  We  could  do  the 
matrix  calculations  manually,  as  illustrated  in  Chapter  7,  or  use  a  computer 
multiple  regression  program.  Since  no  new  problems  are  encountered,  we  simply 
present  the  basic  computer  output  in  Table  9.3,  including  needed  extra  sums  of 
squares  and  the  s2(b)  matrix. 


9.2  Example  1 — One  independent  variable 


/ 


307 


TABLE  9.2  Data  matrices  for  cafeteria  coffee  sales  example 


508.1 

1 

X 

-3 

x2 

9 

498.4 

1 

-3 

9 

568.2 

1 

-2 

4 

577.3 

1 

-2 

4 

651.7 

1 

-1 

1 

657.0 

1 

-1 

1 

713.4 

X  = 

1 

0 

0 

697.5 

1 

0 

0 

755.3 

1 

1 

1 

758.9 

1 

1 

1 

787.6 

1 

2 

4 

792.1 

1 

2 

4 

841.4 

1 

3 

9 

831.8 

1 

3 

9 

TABLE  9.3  Regression  results  for  cafeteria  coffee  sales  example 


(a)  Regression  Coefficients 


Regression 

Coefficient 

Estimated 

Regression  Coefficient 

Estimated 
Standard  Deviation 

t* 

fto 

705.474 

3.208 

219.91 

Pi 

54.893 

1.050 

52.28 

Pn 

-4.249 

.606 

-7.01 

(b)  Analysis  of  Variance 


Source  of 


Variation 

SS 

df 

MS 

Regression 

171,773 

2 

85,887 

X 

168,741 

1 

168,741 

2  1 

X  \x 

3,033 

1 

3,033 

Error 

679 

11 

61.7 

Total 

172,453 

13 

(c)  s2(b)  Matrix 
10.2912  0  -1.4702 

0  1.1026  0 

-1.4702  0  .3675 


The  fitted  regression  function  is: 

(9.11)  Y  =  705.47  +  54.89*  -  4.25x2 

This  response  function  is  plotted  in  Figure  9.5,  together  with  the  original  data. 
We  show  the  horizontal  scale  expressed  at  the  bottom  in  the  deviation  units  *  and 
at  the  top  in  the  original  units  X. 


308  /  Polynomial  regression 


FIGURE  9.5  Fitted  second-order  polynomial  regression — cafeteria  coffee  sales  example 


Number  of  Dispensers 

0  2  4  6 


Algebraic  version  of  normal  equations.  The  algebraic  version  of  the  least 
squares  normal  equations: 


X'Xb  =  X'Y 

for  the  second-order  polynomial  model  (9.10)  can  be  readily  obtained  from 
(7.65)  by  replacing  Xn  by  xt  and Xi2  by  xf.  Since  Xv,  =  0,  this  yields  the  normal 
equations: 


(9.12) 


ST/  =  nb0  +  bn'Lxf 
Sv/T/  =  b^xf  +  bn^x3i 
2v?T/  =  b0Xxf  +  bi^xf  +  bn’Lxf 


Residual 


9.2  Example  1 — One  independent  variable  /  309 


Analysis  of  aptness  of  model 

Residual  analysis.  To  study  the  aptness  of  model  (9.10)  for  her  data,  the 
analyst  plotted  the  residuals  et  against  the  fitted  values,  as  shown  in  Figure  9.6a, 
and  also  against  the  independent  variable  xt  expressed  in  deviation  units,  as 
shown  in  Figure  9.6b.  We  do  not  present  the  calculations  of  the  et,  as  these  are 
routine. 

FIGURE  9.6  Residual  plots  for  cafeteria  coffee  sales  example 

(a)  Residual  Plot  against  Y  (b)  Residual  Plot  against  * 


12-i 


12n 


8d 


8H 


4- 

0  -■ 

-4- 


O 

rj 

xs 

'cn 

<1> 

cr 


4- 

0-- 

-4- 


-8H 


-8H 


-12  a - 1 - 1 - T - 1 - 1 

500  600  700  800  900 

Fitted  Value 


-12  J - 1 - 1 - 1 - 1 - 1 - 1 - 1 

-3-2-10  1  2  3 

Deviation  Units 


There  are  no  systematic  departures  from  0  evident  in  the  residuals  as  either  Y 
or  x  increases,  suggesting  that  the  quadratic  response  function  is  a  good  fit. 
Figure  9.5  makes  this  point  also.  Further,  there  is  no  tendency  in  Figures  9.6a 
and  9.6b  for  the  spread  in  the  residuals  to  vary  systematically,  so  it  appears  that 
the  constant  error  variance  assumption  is  reasonable.  A  normal  probability  plot, 
not  shown  here,  did  not  provide  any  strong  evidence  that  the  distribution  of  the 
error  terms  is  far  from  normal. 

Based  on  this  study  of  the  aptness  of  the  model,  the  analyst  was  willing  to 
conclude  that  the  normal  error  model  (9. 10)  with  constant  error  variance  is  ap¬ 
propriate  here. 

Test  for  quadratic  response  function.  Since  there  are  two  repeat  observa¬ 
tions  for  each  level  of  x ,  the  analyst  could  have  made  a  formal  test  of  the  aptness 
of  the  model,  the  alternatives  being: 

9  //„:  E(Y)  =  /3„  +  /3,x  +  I3nx2 

Ha'-  E(Y)  ^  /3o  +  fax  +  finx2 

The  basic  ANOVA  results  were  presented  earlier  in  Table  9.3b.  The  pure  error 


310  /  Polynomial  regression 


sum  of  squares  is  obtained  as  follows  from  the  data  in  Table  9.2: 

SSPE  =  (508.1  -  503. 25)2  +  (498.4  -  503. 25)2  +  (568.2  -  572.75)2 

+  •••+  (831.8  -  836. 6)2  =  292 

Note  that  Yx  =  503.25  for  x  =  -3,  Y2  =  572.75  for  x  =  —2,  and  so  on.  There 
are  14  —  7  =  7  degrees  of  freedom  associated  with  SSPE.  Hence,  we  have: 

SSPE  292 

MSPE  = - = - =  41.7 

7  7 

Now  we  are  in  a  position  to  obtain  the  lack  of  fit  sum  of  squares  by  (4.12): 

SSLF  =  SSE  -  SSPE  =  679  -  292  =  387 

There  are  7  —  3  =  4  degrees  of  freedom  associated  with  SSLF.  (Remember  that 
three  parameters  had  to  be  estimated  for  the  fitted  regression  equation.)  Hence, 
we  have: 


MSLF  = 


SSLF 

4 


387 

4 


96.8 


Thus,  test  statistic  (4.15)  here  is: 


p* 


MSLF 

MSPE 


96.8 

41.7 


2.32 


Assuming  the  level  of  significance  is  to  be  .05,  we  require  F(. 95;  4,  7)  =  4.12. 
Since  F*  =  2.32  <  4.12,  we  conclude  H0,  that  the  quadratic  response  function 
is  appropriate. 


Test  whether  j3n  equals  zero 


t  test.  The  analyst  next  studied  whether  the  quadratic  term  could  be  dropped 
from  the  model.  She  therefore  wished  to  test: 


(9.14) 


H0 :  Pn  =  0 

H a'.  fin  7^  0 


H0  implies  that  there  is  no  quadratic  effect  in  the  response  function. 
Table  9.3a  indicates  that: 


t 


* 


bn 

s(bu) 


-4.249 

.606 


-7.012 


For  a  level  of  significance  of  .05,  we  require  t(. 975;  11)  =  2.201.  The  decision 
rule  is: 


If  |f*  I  <2.201,  conclude  H0 
If  \t*  |  >2.201,  conclude  Ha 


9.2  Example  1 — One  independent  variable  /  311 


Since  1 1*  |  =  7.012  >  2.201,  we  conclude  Ha,  that  a  quadratic  effect  does  exist, 
so  that  the  quadratic  term  should  be  retained  in  the  model. 


Partial  F  test.  The  analyst  could  also  have  used  the  partial  F  test  to  choose 
the  appropriate  conclusion  in  (9. 14).  Indeed,  she  had  specified  the  order  of  enter¬ 
ing  the  variables  x  and  x2  into  the  computer  regression  fit  so  that  she  would 
obtain  the  extra  sums  of  squares  SSR(x )  and  SSR (x2  \ x)  in  the  output.  Utilizing 
the  partial  F  test  statistic  (8.24)  and  the  results  in  Table  9.3b,  we  obtain: 


MSR(x2\x ) 
MSE 


3,033 

61.7 


49.2 


For  a  5  percent  level  of  significance,  we  need  F(. 95;  1,  11)  =  4.84.  Since 
F*  =  49.2  4.84,  w^e  are  led  to  conclude  FLq^  as  by  the  t  test. 

Note 


One  may  observe  here  the  relation  discussed  in  the  previous  chapter  between  the  t  and 
partial  F  tests  as  to  whether  a  regression  coefficient  equals  zero.  We  have  for  the  two  test 
statistics: 

(f*)2  =  ( — 7.012)2  =  49.2  =  F* 


Estimation  of  regression  coefficients 

The  analyst  next  wished  to  obtain  confidence  bounds  on  the  two  regression 
coefficients  fix  and  j Qn  with  family  confidence  coefficient  .90.  The  Bonferroni 
method  is  to  be  used,  in  view  of  its  simplicity  and  the  ease  of  interpreting  the 
results. 

Here  g  =  2  statements  are  desired;  hence,  by  (7.44a),  we  have: 

B  =  t(l-  .10/2(2);  11)  =  r(.975;  11)  =  2.201 
From  Table  9.3a,  we  find: 

bx  —  54.893  s(bx)  =  1.050 

bxx  =  -4.249  s(bxx)  =  .606 

The  Bonferroni  confidence  intervals  therefore  are  by  (7.44): 

54.893  -  2.201(1.050)  <  fix  <  54.893  +  2.201(1.050) 
or: 

52.58  <  j8j  <  57.20 

-4.249  -  2.201(.606)  <  j8„  <  -4.249  +  2.201(.606) 
or: 

-5.58  <  j8„  <  -2.92 

The  analyst  was  satisfied  with  the  precision  of  these  two  statements,  feeling 
that  the  intervals  are  narrow  enough  to  give  her  reliable  simultaneous  information 
about  the  comparative  magnitudes  of  the  linear  and  quadratic  effects. 


312  /  Polynomial  regression 


Coefficient  of  multiple  determination 

For  a  descriptive  measure  of  the  degree  of  relation  between  coffee  sales  and 
number  of  dispensing  machines,  the  analyst  calculated  the  coefficient  of  multiple 
determination  using  the  data  in  Table  9.3b: 

7  SSR  171,773 

R2  = - = - ! - =  .996 

SSTO  172,453 

This  measure  shows  that  the  variation  in  coffee  sales  is  reduced  by  99.6  percent 
when  the  quadratic  relation  to  the  number  of  dispensing  machines  is  utilized. 

Note  that  the  coefficient  of  multiple  determination  R2  is  the  relevant  measure 
here,  not  the  coefficient  of  simple  determination  r2,  since  model  (9.10)  is  a 
multiple  regression  model  even  though  it  contains  only  one  independent  variable. 
Sometimes  in  curvilinear  regression,  the  coefficient  of  multiple  correlation  R  is 
called  the  correlation  index. 

Estimation  of  mean  response 

The  analyst  was  particularly  interested  in  the  mean  response  for  Xh  =  3  dis¬ 
pensing  machines.  She  wished  to  estimate  this  mean  response  with  a  98  percent 
confidence  coefficient.  The  proper  interval  estimate  is  given  by  (7.51).  For  our 
example,  where  xh  =  Xh  —  X  —  3  —  3  =  0,  we  have: 


A 

The  estimated  mean  response  Yh  corresponding  to  is  by  (7.47): 

Yh  =  X'h b  =  [1  0  0]  [705.4741  =  705.474 

54.893 

—4.249 

Next,  using  the  results  in  Table  9.3c  for  s2(b),  we  obtain  when  substituting  into 
(7.50): 

s2{fh)  =  X^s2(b)X, 

=  [1  0  0]  [10.2912  0  -1.4702]  [f 

0  1.1026  0  0 

-1.4702  0  .3675J  [_0_ 

=  10.2912 
or: 

s{Yh)  =  V  10.2912  =  3.208 
We  require  f(.99;  11)  =  2.718.  Hence,  we  obtain: 

705.474  -  2.718(3.208)  <  E{Yh)  <  705.474  +  2.718(3.208) 


9.3  Example  2 — Two  independent  variables  /  31 3 


or: 


696.8  <E{Yh)  <  714.2 

With  confidence  coefficient  .98,  the  analyst  can  conclude  that  the  mean  of  the 
distribution  of  coffee  sales  when  three  dispensing  machines  are  used  is  some¬ 
where  between  696.8  and  714.2  hundred  gallons. 


Regression  function  in  terms  of  X 

The  analyst  wishes  for  reporting  purposes  to  express  the  fitted  regression 
function  in  terms  of  X  rather  than  in  terms  of  deviations  x  =  X  —  X.  The  follow¬ 
ing  formulas  provide  the  appropriate  coefficients  for  model  (9.10);  the  primes  on 
the  coefficients  denote  the  new  coefficients  in  terms  of  X: 


(9.15a) 

b'0  =  b0  -  hX  +  bnX2 

(9.15b) 

b\  =  b\  —  2  b\\X 

(9.15c) 

b'n  =  bn 

For  our  example,  where  X  =  3,  we  obtain: 

b'0  =  705.474  -  54.893(3)  +  (-4.249)(3)2  =  502.554 
b[  =  54.893  -  2(— 4.249)(3)  =  80.387 
b\ i  =  -4.249 

so  that  the  fitted  regression  function  in  terms  of  X  is: 

Y  =  502.554  +  80.387X  -  4.249X2 

The  fitted  values  and  residuals  for  the  regression  function  in  terms  of  X  are 
exactly  the  same  as  for  the  regression  function  in  terms  of  the  deviations  x.  The 
reason  for  utilizing  model  (9.10),  which  is  expressed  in  terms  of  deviations  x,  is 
to  avoid  potential  calculational  difficulties  due  to  multicollinearity  between  X 
and  X2,  inherent  in  polynomial  regression. 

Note 

The  estimated  standard  deviations  of  the  regression  coefficients  in  Table  9.3a  do  not 
apply  to  the  regression  coefficients  in  terms  of  X  obtained  by  (9.15).  If  the  estimated 
standard  deviations  for  the  regression  coefficients  in  terms  ofX  are  desired,  they  may  be 
obtained  from  s2(b)  in  Table  9.3c  by  using  theorem  (6.49),  where  the  transformation 
matrix  A  is  developed  from  (9.15). 


9.3  EXAMPLE  2— TWO  INDEPENDENT  VARIABLES 

We  shall  discuss  now  another  example  of  polynomial  regression,  this  one 
involving  two  independent  variables.  Rather  than  carrying  this  example  through 
all  of  the  various  analytical  stages  as  we  did  the  first  example,  we  shall  focus 
here  on  the  analysis  of  interaction  effects  and  quadratic  effects. 


314  /  Polynomial  regression 


Setting 

For  a  sample  of  18  managers  in  the  35-44  age  group,  Table  9.4  shows  the 
average  annual  income  during  the  past  two  years  (X1),  risk  aversion  score  (X2), 
and  amount  of  life  insurance  carried  (7).  Risk  aversion  was  measured  by  a 
standard  questionnaire  administered  to  each  manager;  the  higher  the  score  the 
greater  the  degree  of  risk  aversion. 

TABLE  9.4  Data  for  life  insurance  example 


inager 

i 

Average  Annual 
Income 

(; thousand  dollars) 
Xn 

Risk  Aversion 
Score 

Xj2 

Amount  of  Life 
Insurance  Carried 
(, thousand  dollars) 
Yi 

i 

66.290 

1 

196 

2 

40.964 

5 

63 

3 

72.996 

10 

252 

4 

45.010 

6 

84 

5 

57.204 

4 

126 

6 

26.852 

5 

14 

7 

38.122 

4 

49 

8 

35.840 

6 

49 

9 

75.796 

9 

266 

10 

37.408 

5 

49 

11 

54.376 

2 

105 

12 

46.186 

7 

98 

13 

46.130 

4 

77 

14 

30.366 

3 

14 

15 

39.060 

5 

56 

16 

79.380 

1 

245 

17 

52.766 

8 

133 

18 

55.916 

6 

133 

Xi  =  50.037 

X2  =  5.389 

A  researcher  was  studying  the  relation  of  average  annual  income  and  risk 
aversion  to  amount  of  life  insurance  carried  by  managers  in  the  given  age  group. 
He  expected  that  a  quadratic  relation  would  hold  between  income  and  amount  of 
life  insurance  carried.  However,  he  would  not  have  been  surprised  if  aversion  to 
risk  showed  only  linear  effects  and  no  quadratic  effects  on  amount  of  life  insur¬ 
ance  carried,  and  he  was  quite  uncertain  whether  or  not  the  two  variables  interact 
in  their  effects  on  amount  of  life  insurance  carried.  Hence,  he  fitted  the  second- 
order  polynomial  regression  model: 

(9.16)  Yi  =  fio  +  +  (32xi2  +  pnx?i  +  j 822xj2  +  j 812xnxi2  +  et 

where: 

Xi\  ~  Xji  —  Xi 
xi2  =  Xi2  —  X2 

with  the  intention  of  first  testing  for  the  presence  of  interaction  effects  and  then 
for  quadratic  effects  of  aversion  to  risk. 


9.3  Example  2 — Two  independent  variables  /  315 


Development  of  model 

Table  9.5a  contains  the  basic  results  for  the  fit  of  model  (9.16).  Since  the 
researcher  wished  to  test  first  for  the  interaction  effects  ((^12X1X2),  he  entered 
the  variables  for  the  regression  fit  so  as  to  obtain  the  extra  sum  of  squares 
jSjSJ^Cxj jc2  |  xj ,  x2,  xj,  x2)  for  the  partial  F  test.  The  ANOVA  table  and  decompo¬ 
sition  of  SSR  into  extra  sums  of  squares  is  shown  in  Table  9.5b. 

TABLE  9.5  Regression  results  for  model  (9.16) — life  insurance  example 


(a)  Regression  Coefficients 


Regression 

Estimated 

Estimated 

Coefficient 

Regression  Coefficient 

Standard  Deviation 

t* 

Po 

102.768 

.662 

155.15 

Pi 

4.4930 

.0475 

94.54 

P  2 

6.028 

.301 

20.00 

Pn 

.03579 

.00219 

16.34 

P22 

.166 

.120 

1.38 

P12 

-.0196 

.0140 

-1.40 

(b)  Analysis  of  Variance 

Source  of 

Variation 

SS 

df 

MS 

Regression 

108,006 

5 

21,601 

*1 

104,474 

1 

104,474 

*2 

X, 

2,284 

1 

2,284 

X2i 

X]  ,  x2 

1,238 

1 

1,238 

xi 

X\  ,  x2,  x\ 

3 

1 

3 

XXX2 \Xi,  x2,  xi,xl 

6 

1 

6 

Error 

36 

12 

3.00 

Total 

108,042 

17 

The  test  for  the  presence  of  interaction  effects  involves  the  alternatives: 


(9.17) 


H0  :  j812  =  0 
Ha  :  (312  ^  0 


Using  the  partial  F  test  statistic  (8.24),  the  researcher  obtained: 

F*  =  msr(xix2\xu  x2,x\,  xj)  =  6_  =  0Q 
MSE  3 


For  level  of  significance  a  =  .05,  we  require  F(. 95;  1,  12)  =  4.75.  Since 
F*  =  2.00  <  4.75,  we  conclude  H0 ,  that  no  interaction  effects  exist.  This  result 
was  welcome  to  the  researcher,  as  it  simplifies  the  interpretation  of  the  effects  of 
the  two  independent  variables. 

Note  that  the  researcher  could  also  have  tested  whether  or  not  j812  =  0  by 
using  |f*|  =  | -1.40 1  =  1.40  (Table  9.5a). 


316  /  Polynomial  regression 


At  this  point,  the  researcher  tentatively  decided  to  adopt  the  no-interaction 
model: 

(9.18)  Yi  =  j80  +  pixn  +  (B2xt2  +  PuXn  +  j S22xj2  +  £/ 

but  he  still  wished  to  examine  whether  a  quadratic  effect  for  risk  aversion  exists. 
This  test  can  be  conducted  by  a  partial  F  test  without  fitting  a  new  model.  We 
utilize  the  definition  of  S'S7?(*i*2|;c1,  x2,  xj,  x2): 

SS/?(*!*2| xu  x2,  xj,  x2)  =  SSE(xi,  x2,  xj,  x2)  -  . SSE(xi,  x2,  xj,  x\,  xxx2) 

Hence: 


SSE(xux2,  xj,  x2)  =  SSR&^Xu  x2,  xj,  x2)  +  SSE(xu  x2,  xj,  x2,  xxx2) 
=  6  +  36  =  42 


When  testing  model  (9.18)  for: 


(9.19) 


Ft  o'.  fi22  —  0 

Ha  :  j622  ^  0 


the  partial  F  test  statistic  (8.24)  is: 


p*  = 


SSR(xj \xi,  x2,  xj) 

1 


3 

1 


42 


13 


SSE(x i,  x2,  xj,  x2) 


18-5 


For  a  5  percent  level  of  significance,  we  require  F{. 95;  1,  13)  =  4.67.  Since 
F*  =  .93  <  4.67,  we  conclude  H0,  that  there  is  no  quadratic  effect  for  aversion 
to  risk. 

Hence,  the  researcher  decided  to  adopt  the  revised  model: 

(9.20)  Yi  =  (B0  +  /3jXn  +  j 62xi2  +  finxjx  +  et 

where: 

xi\  =  Xil  —  Xl 

xi2  =  Xi2  —  X2 

and  fitted  this  model  to  the  data.  He  obtained  the  estimated  response  function: 

Y  =  103.136  +  4.551*!  +  5.685x2  +  .0371*? 

which  in  the  original  units  is: 

Y  =  -74.583  +  .8383Xj  +  5.685X2  +  ,0371Xf 

Figure  9.7  contains  a  three-dimensional  computer-generated  plot  of  this  fitted 
response  surface.  The  researcher  then  used  this  fitted  response  function  for  fur¬ 
ther  investigation  of  the  effects  of  average  annual  income  and  aversion  to  risk  on 
amount  of  life  insurance  carried  in  the  population  under  study. 


9.4  Estimating  the  maximum  or  minimum  of  a  quadratic  regression  function  /  317 


FIGURE  9.7  Three-dimensional  computer-generated  plot  of  response  function 

Y  =  -74.583  +  .8383X1  +  5.685X2  +  .0371 X? — life  insurance  example 


360-1 


2  25  30 


'r_ - r- 7(1  75  80 

-I— — I 1  r.  cq  60  88 

^ 7n  as  50  55  cj 

3  Avera ge  Annuel  Income 


Comments 

1 .  Note  the  advantage  of  computer  packages  which  provide  extra  sums  of  squares  in 
appropriate  order.  The  researcher  was  able  to  conduct  a  F  test  about  the  revised  model 
(9.18)  using  the  regression  run  for  model  (9.16).  The  equivalent  t  test  would  require  a 
new  run  for  fitting  model  (9.18). 

2.  When  multiple  tests  on  the  same  data  are  conducted,  there  exist,  as  noted  earlier, 
problems  with  respect  to  the  level  of  significance  for  the  family  of  inferences.  In  the 
example  just  cited,  the  researcher  was  willing  to  conduct  two  tests  on  the  same  data,  one 
for  interaction  effects  and  one  for  quadratic  effects  of  aversion  to  risk.  The  reason  was  that 
he  knew  by  the  Bonferroni  inequality  (5.6a)  that  the  family  level  of  significance  for  the 
two  tests  (each  conducted  at  the  5  percent  level  of  significance)  could  not  exceed  10 
percent. 

In  Chapter  12,  we  shall  discuss  more  fully  the  empirical  determination  of  an  appropri¬ 
ate  regression  model. 

9.4  ESTIMATING  THE  MAXIMUM  OR  MINIMUM  OF  A 
QUADRATIC  REGRESSION  FUNCTION 

Sometimes  in  quadratic  regression,  we  wish  to  estimate  the  maximum  (or 
minimum)  mean  response  of  the  regression  function,  and/or  the  level  of  X  at 


318  /  Polynomial  regression 


which  the  maximum  (or  minimum)  occurs.  Figure  9.2  illustrates  a  quadratic 
response  function  with  a  maximum  mean  response. 

Given  the  estimated  quadratic  response  function: 

(9.21)  Y  =  b0  +  hx  +  bnx2 

the  maximum  (minimum)  occurs  at  the  level  xm : 


(9.22)  xm  =  = - 

2bn 

In  terms  of  the  original  variable  X,  the  maximum  (minimum)  occurs  at  the  level 

Xm: 

(9.22a)  Xm  =  X--£- 

2bu 


The  estimated  mean  response  at  Xm  is: 

b2 

(9.23)  Y„  =  b0-  —r~ 

4  bn 

Ym  is  a  maximum  if  bn  is  negative,  and  a  minimum  if  bn  is  positive. 


Example 

For  our  earlier  cafeteria  coffee  sales  example,  the  fitted  regression  curve  was: 

Y  =  705.47  +  54.89x  -  4.25x2 


If  the  quadratic  regression  function  were  appropriate  for  larger  x  values  than 
those  in  the  study,  we  could  estimate  that  maximum  mean  coffee  sales  occur  at: 


Xm  =  3  - 


54.89 
2(— 4.25) 


=  9 


and  the  estimated  mean  response  there  is: 

Ym  =  705.47  +  54.89(9)  -  4.25(9)2  =  855 


Comments 

1 .  To  derive  (9.22),  we  differentiate  Y  in  (9.21)  with  respect  to  x,  and  set  this  deriva¬ 
tive  equal  to  0: 

dY  d  ,  7  , 

— ~  —  ■  (bo  +  b\X  +  bnx 2)  =  bi  +  2bnx  —  0 
dx  dx 


b\ 

2bn 


and  obtain: 


9.5  Some  further  comments  on  polynomial  regression  /  319 


Substituting  this  value  into  the  fitted  response  function  (9.21),  we  find: 


Ym  =  b0  +  bi 


~b\ 
2b 


+  bu 


-b\ 


li 


2*ii 


=  b0 - 


b\ 


4*n 


2.  For  large  samples,  the  approximate  estimated  variance  of  Xm  is: 


(9.24) 


s\Xm) 


b\ 


4*fi 


s  (bx)  s2{bn) 


bh 


2s(bubn) 

*i*n 


This  approximate  estimated  variance  can  be  used  to  construct  a  confidence  interval  for  the 
true  X  level  at  which  the  maximum  (minimum)  occurs.  Approximate  confidence  intervals 
for  E(Ym)  can  also  be  obtained.  These  are  discussed  in  Reference  9.1. 


9.5  SOME  FURTHER  COMMENTS  ON  POLYNOMIAL 
REGRESSION 

1 .  The  use  of  polynomial  models  in  X  is  not  without  drawbacks .  Such  models 
can  be  more  expensive  in  degrees  of  freedom  than  alternative  nonlinear  models 
or  linear  models  with  transformed  variables.  Another  potential  drawback  is  that 
multicollinearity  is  unavoidable.  Indeed,  if  the  levels  of  X  are  restricted  to  a 
narrow  range,  the  degree  of  multicollinearity  in  the  columns  of  the  X  matrix  can 
be  quite  high,  especially  for  higher-degree  polynomials.  It  is  for  this  reason  that 
all  polynomial  regression  models  in  this  chapter  are  formulated  in  terms  of  devia¬ 
tions  Xj  =  X{  —  X.  To  illustrate  how  helpful  the  use  of  deviation  variables  can  be, 
in  our  life  insurance  example  the  coefficient  of  correlation  between  X\  and  Xj  is 
.991,  but  it  is  only  .445  between  Xi  and  x\.  As  noted  earlier,  when  multicolline¬ 
arity  is  high,  serious  calculational  difficulties  in  inverting  the  X'X  matrix  can 
arise. 

2.  An  alternative  to  using  variables  expressed  in  deviations  from  the  mean  in 
polynomial  regression  is  to  use  orthogonal  polynomials.  Orthogonal  polynomials 
are  uncorrelated.  Some  computer  packages  use  orthogonal  polynomials  in  their 
polynomial  regression  routines  and  present  the  final  fitted  results  in  terms  of  both 
the  orthogonal  polynomials  and  the  original  polynomials.  Orthogonal  polynomi¬ 
als  are  discussed  in  specialized  texts  such  as  Reference  9.2. 

3 .  Sometimes  a  quadratic  response  function  is  fitted  for  the  purpose  of  estab¬ 
lishing  the  linearity  of  the  response  function  when  repeat  observations  are  not 
available  for  directly  testing  the  linearity  of  the  response  function.  Fitting  the 
quadratic  model: 

(9.25)  Yi  =  fa  +  j QxXi  +  (3  nxf  +  et 

and  testing  whether  (3n  —  0  does  not,  however,  necessarily  establish  that  a  linear 
response  function  is  appropriate.  Figure  9.8  provides  an  example.  If  sample  data 
were  obtained  for  the  response  function  in  Figure  9.8,  model  (9.25)  fitted,  and  a 
test  on  (3U  made,  it  likely  would  lead  to  the  conclusion  that  (3U  =  0.  Yet  a  linear 


320  /  Polynomial  regression 


FIGURE  9.8  Example  of  curvilinear  response  function 
Y\ 


0  x 


response  function  clearly  might  not  be  appropriate.  Examination  of  residuals 
would  disclose  this  lack  of  fit,  and  should  always  accompany  formal  testing  of 
polynomial  regression  coefficients. 

4.  When  a  polynomial  regression  model  with  one  independent  variable  is 
employed,  one  ordinarily  fits  a  polynomial  to  the  highest  power  expected  to  be 
appropriate,  and  the  decomposition  of  557?  into  extra  sum  of  squares  components 
proceeds  as  follows: 


557?  (x) 

SSR(x2\x) 

SSR(x3  \x,x2) 
etc. 

The  reason  for  this  approach  is  that  generally  one  is  most  interested  in  whether 
higher-order  terms  can  be  dropped  from  the  model.  Thus,  when  a  cubic  model  is 
fitted  because  it  is  expected  that  a  third-order  model  will  be  sufficient  and  one 
wishes  to  test  whether  /3m  =  0,  the  appropriate  extra  sum  of  squares  is 
557?(x3|x,  x2).  If,  instead,  one  wishes  to  test  whether  a  linear  term  is  adequate  so 
that  (3n  =  (3ni  =  0,  the  appropriate  extra  sum  of  squares  is  557?  (x2,  x3  |x) 
=  557?(x2|x)  +  557?  (x3 1  x,  x2). 

Ordinarily,  one  would  not  fit  a  third-order  model  and  test  first  whether  a 
lower-order  coefficient  is  zero,  say,  whether  (3n  =  0.  The  reason  is  that  it  is 
usually  desired  to  employ  as  simple  a  regression  model  as  possible,  which  in  the 
case  of  polynomial  regression  means  a  lower-order  model. 


Problems  /  321 


PROBLEMS 

9.1.  A  speaker  stated:  “In  developing  third-order  or  higher-order  polynomial  regres¬ 
sion  models  in  social  science  and  managerial  applications,  inferences  on  the  /3’s 
usually  take  the  form  of  direct  tests.  There  is  relatively  little  interest  in  estimating 
the  /3’s  to  assess  effects  of  the  individual  polynomial  terms.”  Why  might  this  be 
so? 

9.2.  Plot  several  contour  curves  for  the  quadratic  response  surface  E(Y)  -  140  +  4 xj 
—  2x|  +  5xix2. 

9.3.  A  junior  investment  analyst  used  a  polynomial  regression  model  of  relatively  high 
order  in  a  research  seminar  on  municipal  bonds.  She  obtained  an  R2  of  .991  in  the 
regression  of  net  interest  yield  of  bond  (T)  on  industrial  diversity  index  of  munici¬ 
pality  (A)  for  seven  bond  issues.  A  classmate,  unimpressed,  said:  “You  over¬ 
fitted.  Your  curve  follows  the  random  effects  in  the  data.” 

a.  Comment  on  the  criticism. 

b.  Might  R2  defined  in  (7.33)  be  more  appropriate  than  R2  as  a  descriptive 
measure  here? 

9.4.  A  student  in  a  class  demonstration  of  how  to  fit  a  second-order  polynomial  model 
in  one  independent  variable  entered  the  X  variables  in  the  form  X,  X2.  He  was 
disturbed  when  the  computer  program  would  not  include  X2  in  the  regression 
model  and  regressed  Y  on  A  only.  The  output  contained  the  message:  X-SQUARE 
IS  REDUNDANT  VARIABLE.  X'X  IS  NEAR-SINGULAR  WHEN  X- 
SQUARE  IS  INCLUDED.  Explain  the  situation.  What  should  the  student  have 
done? 

9.5.  Refer  to  the  life  insurance  example  on  page  314.  A  student  observed  that  the 
interaction  term  (3 12*1*2  and  the  quadratic  effect  term  (322*2  were  each  dropped 
from  model  (9.16)  at  an  .05  level  of  significance  and  suggested  that  this  same 
result  could  have  been  obtained  from  a  glance  at  Table  9.5a,  since  each  relevant 
|  t*  |  statistic  does  not  exceed  t(. 975;  12)  =  2.179.  Do  you  agree  with  the  stu¬ 
dent’s  suggestion?  Explain. 

9.6.  Mileage  study.  The  effectiveness  of  a  new  experimental  overdrive  gear  in  re¬ 
ducing  gasoline  consumption  was  studied  in  12  trials  with  a  light  truck  equipped 
with  this  gear.  In  the  data  that  follow,  X,  denotes  the  constant  speed  (in  miles  per 
hour)  on  the  test  track  in  the  ith  trial  and  Yt  denotes  miles  per  gallon  obtained. 

i:  1  2  3  4  5  6  7  8  9  10  11  12 

A,:  35  35  40  40  45  45  50  50  55  55  60  60 

Yr.  22  20  28  31  37  38  41  39  34  37  27  30 

The  second-order  regression  model  (9.1a)  with  independent  normal  error  terms  is 
expected  to  be  appropriate. 

a.  Fit  regression  model  (9.1a).  Plot  the  fitted  regression  function  and  the  data. 
Does  the  quadratic  regression  function  appear  to  be  a  good  fit  here?  Find  R2. 

b.  Test  whether  or  not  there  is  a  regression  relation.  Control  the  risk  of  a  Type  I 
error  at  .05.  State  the  alternatives,  decision  rule,  and  conclusion. 

c.  Estimate  the  mean  miles  per  gallon  for  test  runs  at  a  speed  of  48  miles  per 
hour;  use  a  95  percent  confidence  interval. 

d.  Predict  the  miles  per  gallon  in  the  next  test  run  at  48  miles  per  hour;  use  a  95 
percent  prediction  interval. 


322  /  Polynomial  regression 


e.  Test  whether  the  quadratic  term  can  be  dropped  from  the  regression  model; 
use  a  =  .05.  State  the  alternatives,  decision  rule,  and  conclusion. 

f.  Express  the  fitted  regression  function  obtained  in  part  (a)  in  terms  of  the 
original  variable  X. 

9.7.  Refer  to  Mileage  study  Problem  9.6. 

a.  Obtain  the  residuals  and  plot  them  against  Y.  Also  prepare  a  normal  probabil¬ 
ity  plot.  Interpret  your  plots. 

b.  Test  formally  for  lack  of  fit  of  the  quadratic  regression  function;  use  a  =  .05. 
State  the  alternatives,  decision  rule,  and  conclusion.  What  assumptions  did 
you  implicitly  make  in  this  test? 

c.  Fit  the  third-order  model  (9.3)  and  test  whether  or  not  /3m  =0;  use  a  =  .05. 
State  the  alternatives,  decision  rule,  and  conclusion.  Is  your  conclusion  con¬ 
sistent  with  your  finding  in  part  (b)? 

9.8.  Piecework  operation.  An  operations  analyst  in  a  multinational  electronics  firm 
studied  factors  affecting  production  in  a  piecework  operation  where  earnings  are 
based  on  the  number  of  pieces  produced.  Two  employees  each  were  selected  from 
various  age  groups  and  data  on  their  productivity  last  year  were  obtained  (X  is  age 
of  employee,  in  years;  Y  is  employee’s  productivity,  coded): 


i: 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Xf. 

20 

20 

25 

25 

30 

30 

35 

35 

40 

Yc. 

97 

93 

99 

105 

109 

106 

109 

111 

100 

i: 

10 

11 

12 

13 

14 

15 

16 

17 

18 

X;: 

40 

45 

45 

50 

50 

55 

55 

60 

60 

Yi : 

105 

97 

101 

105 

103 

105 

109 

112 

110 

The  analyst  recognized  that  the  relation  between  age  and  productivity  is  complex, 
in  part  because  earnings  targets  (which  he  could  not  measure)  shift  in  complex 
ways  with  age.  However,  he  believed  that  for  purposes  of  estimating  mean  re¬ 
sponses,  the  response  function  can  be  approximated  suitably  by  a  polynomial  of 
third  order  and  that  the  error  terms  are  independent  and  approximately  normally 
distributed. 

a.  Fit  regression  model  (9.3).  Plot  the  fitted  regression  function  and  the  data. 
Does  the  cubic  regression  function  appear  to  be  a  good  fit  here?  Find  R2. 

b.  Test  whether  or  not  there  is  a  regression  relation;  use  a  level  of  significance  of 
.01 .  State  the  alternatives,  decision  rule,  and  conclusion.  What  is  the  P-value 
of  the  test? 

c.  Obtain  joint  interval  estimates  for  the  mean  productivity  of  employees  aged 
53,  58,  and  62,  respectively.  Use  the  most  efficient  simultaneous  estimation 
procedure  and  a  99  percent  family  confidence  coefficient. 

d.  Predict  the  productivity  of  an  employee  aged  53  using  a  99  percent  prediction 
interval. 

e.  Express  the  fitted  regression  function  obtained  in  part  (a)  in  terms  of  the 
original  variable  X. 

9.9.  Refer  to  Piecework  operation  Problem  9.8. 

a.  Test  whether  both  the  quadratic  and  cubic  terms  can  be  dropped  from  the 
regression  model;  use  a  =  :01.  State  the  alternatives,  decision  rule,  and  con¬ 
clusion. 


Problems  /  323 


b.  Test  whether  the  cubic  term  alone  can  be  dropped  from  the  regression  model; 
use  a  —  .01.  State  the  alternatives,  decision  rule,  and  conclusion. 

9.10.  Refer  to  Piecework  operation  Problem  9.8. 

a.  Obtain  the  residuals  and  plot  them  against  the  fitted  values.  Also  prepare  a 
normal  probability  plot.  What  do  your  plots  show? 

b.  Test  formally  for  lack  of  fit.  Control  the  risk  of  a  Type  I  error  at  .01 .  State  the 
alternatives,  decision  rule,  and  conclusion.  What  assumptions  did  you  im¬ 
plicitly  make  in  this  test? 

9.11.  Sales  forecasting.  The  Wheaton  Company  introduced  a  new  product  in  1975. 
Annual  sales  of  this  product  ( Y ,  in  thousand  units)  follow;  the  time  period  (. X )  is 
coded,  with  X  —  1  for  1975. 

*.•1  2  3  4  5  6  7  8  9 

X,:  1  2  3  4  5  6  7  8  9 

F,-:  3.49  3.78  4.05  4.41  4.73  5.12  5.56  5.99  6.44 

Assume  that  the  second-order  polynomial  regression  model  (9.1a)  with  independ¬ 
ent  normal  error  terms  is  appropriate, 

a.  Fit  regression  model  (9.1a).  Plot  the  fitted  regression  function  and  the  data. 
Does  the  quadratic  regression  function  appear  to  be  a  good  fit  here?  What  is 
R21  Do  you  believe  that  the  quadratic  regression  function  is  appropriate  for 
making  projections  to  1995?  Discuss. 

b.  Obtain  simultaneous  Bonferroni  confidence  intervals  for  (3\  and  f3n  with  a  90 
percent  family  confidence  coefficient. 

c.  Predict  sales  of  the  product  for  1985  using  a  90  percent  prediction  interval. 

d.  Express  the  fitted  regression  function  obtained  in  part  (a)  in  the  original  X 
units. 

9.12.  Refer  to  Sales  forecasting  Problem  9.11. 

a.  Test  whether  the  quadratic  term  can  be  dropped  from  the  regression  model. 
Control  the  risk  of  a  Type  I  error  at .  10.  State  the  alternatives,  decision  rule, 
and  conclusion.  What  is  the  P-value  of  the  test? 

b.  Obtain  the  residuals.  Plot  the  residuals  against  the  fitted  values.  Also  plot 
them  against  time.  What  do  these  plots  show? 

9.13.  Crop  yield.  An  agronomist  studied  the  effects  of  moisture  (Zls  in  inches)  and 
temperature  (X2,  in  °C)  on  the  yield  of  a  new  hybrid  tomato  (F).  The  experimental 
data  follow. 


i: 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

Xn: 

6 

6 

6 

6 

6 

8 

8 

8 

8 

8 

10 

10 

10 

Xa- 

20 

21 

22 

23 

24 

20 

21 

22 

23 

24 

20 

21 

22 

Y{ 

49-2 

48.1 

48.0 

49.6 

47.0 

51.5 

51.7 

50.4 

51.2 

48.4 

51.1 

51.5 

50.3 

i ; 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

Aa: 

10 

10 

12 

12 

12 

12 

12 

14 

14 

14 

14 

14 

A2: 

23 

24 

20 

21 

22 

23 

24 

20 

21 

22 

23 

24 

Yf. 

48.9 

48.7 

48.6 

47.0 

48.0 

46.4 

46.2 

43.2 

42.6 

42.1 

43.9 

40.5 

The  agronomist  expects  that  the  second-order  polynomial  regression  model  (9.5) 
with  independent  normal  error  terms  is  appropriate  here, 
a.  Fit  regression  model  (9.5).  Plot  the  Y  observations  against  the  fitted  values. 
Does  the  response  function  provide  a  good  fit? 


324  /  Polynomial  regression 


b.  Calculate  R2.  What  information  does  this  measure  provide? 

c.  Test  whether  or  not  there  is  a  regression  relation;  use  a  =  .05.  State  the 
alternatives,  decision  rule,  and  conclusion.  What  is  the  P-value  of  the  test? 

d.  Estimate  the  mean  yield  when  X\  —  7  and  X2  =  22;  use  a  95  percent  confi¬ 
dence  interval. 

e.  Express  the  fitted  response  function  obtained  in  part  (a)  in  the  original  X 
variables. 

9.14.  Refer  to  Crop  yield  Problem  9.13. 

a.  Test  whether  or  not  the  interaction  term  can  be  dropped  from  the  model. 
Control  the  a  risk  at  .005.  State  the  alternatives,  decision  rule,  and  conclu¬ 
sion. 

b.  Assuming  that  the  interaction  term  has  been  dropped  from  the  model,  test 
whether  or  not  the  quadratic  effect  term  for  temperature  can  be  dropped  from 
the  model;  control  the  a  risk  at  .005.  State  the  alternatives,  decision  rule,  and 
conclusion.  What  is  the  combined  a  risk  for  both  the  test  here  and  the  one  in 
part  (a)? 

c.  Fit  the  second-order  polynomial  model  omitting  the  interaction  term  and  the 
quadratic  effect  term  for  temperature.  Obtain  the  residuals  and  plot  them 
against  the  fitted  values.  What  does  your  plot  show? 

9.15.  Computerized  game.  Students  comprising  firm  A  in  a  computerized  marketing 
game  have  approached  you  for  assistance  in  analyzing  the  relation  between  pro¬ 
motional  expenditures  ( X )  and  demand  for  their  firm’s  product  ( Y )  in  the  firm’s 
home  territory.  They  believe  that  the  following  characteristics  hold  in  this  rela¬ 
tion:  (1)  demand  in  the  home  territory  is  affected  primarily  by  promotional  ex¬ 
penditures,  (2)  the  relation  is  either  quadratic  or  linear  within  the  range  of  X  levels 
of  interest  to  the  firm.  The  team  has  provided  the  observations  shown  below  for 
the  14  periods  covered  in  the  game  to  date  (X  in  thousand  dollars,  Y  in  thousand 
units),  and  has  stated  that  these  observations  span  the  X  levels  of  interest. 

i:  1  2  3  4  5  6  7 

Xi.  17  15  25  10  18  15  20 

Yf.  56.15  54.50  55.27  52.54  56.23  55.97  55.55 

i :  8  9  10  11  12  13  14 

Xi.  25  17  13  20  23  25  16 

Y-.  54.32  55.14  54.28  55.78  55.65  54.96  55.06 

Assume  that  the  second-order  model  (9.1a)  with  independent  normal  error  terms 
applies. 

a.  Fit  this  model  and  test  whether  a  regression  relation  exists.  Use  a  level  of 
significance  of  .01.  State  the  alternatives,  decision  rule,  and  conclusion. 

b.  Test  whether  the  quadratic  term  can  be  dropped  from  the  model.  Use  a  level 
of  significance  of  .01.  State  the  alternatives,  decision  rule,  and  conclusion. 

c.  Obtain  the  residuals  and  plot  them  against  Y.  Also  obtain  a  normal  probability 
plot.  What  do  your  plots  show? 

d.  Conduct  a  formal  test  for  lack  of  fit  using  a  level  of  significance  of  .01 .  State 
the  alternatives,  decision  rule,  and  conclusion.  Does  your  conclusion  imply 
that  the  model  cannot  be  improved  further?  Discuss. 

9.16.  Refer  to  Computerized  game  Problem  9.15.  Someone  who  is  familiar  with  this 
computerized  marketing  game  enters  the  discussion.  She  states  that  in  the  system 


Exercises  /  325 


of  equations  on  which  the  game  is  based,  a  quadratic  relation  does  hold  between 
promotional  expenditures  and  mean  demand  in  the  firm’s  home  territory.  She 
believes  that  another  significant  variable  related  to  expected  demand  in  the  home 
territory  is  the  ratio  of  the  firm’s  selling  price  to  the  average  competitive  selling 
price;  however  she  does  not  recall  whether  this  price  ratio  has  both  linear  and 
quadratic  effects.  She  also  does  not  recall  whether  price  ratio  and  promotional 
expenditures  interact  in  affecting  demand.  The  firm’s  price  ratios  for  the  14  peri¬ 
ods  are  as  follows: 

i:  1  2  3  4  5  6  7 

Ratio:  .931  .976  1.045  .939  1.010  1.059  1.000 

i:  8  9  10  11  12  13  14 

Ratio:  .950  .995  1.011  1.008  .947  1.000  1.017 

a.  Fit  the  second-order  polynomial  regression  model  (9.5)  with  promotional 
expenditures  (Xi)  and  price  ratio  (X2)  as  independent  variables.  How  much 
has  R2  increased  by  adding  the  price  ratio  as  an  independent  variable? 

b.  Test  whether  the  price  ratio  variable  should  be  retained  in  the  model.  Control 
the  risk  of  Type  I  error  at  .05.  State  the  alternatives,  decision  rule,  and 
conclusion. 

c.  Assuming  that  the  price  ratio  variable  is  to  be  retained  in  the  model,  test 
whether  the  interaction  term  is  needed  in  the  model;  use  a  =  .01.  State  the 
alternatives,  decision  rule,  and  conclusion.  What  is  the  P-value  of  the  test? 

d.  The  team  has  decided  to  adopt  regression  model  (9.5)  without  interaction 
effects.  Fit  this  model  and  obtain  the  residuals.  Plot  the  residuals  against  Y 
and  against  the  time  order  of  the  observations.  Also,  prepare  a  normal  proba¬ 
bility  plot.  Analyze  these  plots  and  state  your  findings. 

9.17.  Refer  to  Mileage  study  Problem  9.6. 

a.  At  what  speed  is  the  estimated  quadratic  response  function  a  maximum? 
What  is  the  estimated  mean  mileage  at  this  speed? 

b.  Does  the  maximum  of  the  response  function  occur  within  the  scope  of  the 
model? 

9.18.  Refer  to  Sales  forecasting  Problem  9.11. 

a.  In  what  year  does  the  minimum  of  the  estimated  quadratic  response  function 
occur?  What  is  the  estimated  mean  sales  for  this  year? 

b.  Does  the  minimum  of  the  response  function  occur  within  the  scope  of  the 
model? 


EXERCISES 


9.19.  Consider  the  second-order  regression  model  with  one  independent  variable  (9.1a) 
and  the  following  two  sets  of  X  values: 


Set  1:  1.0  1.5  1.1  1.3  1.9  .8  1.2  1.4 

Set  2:  12  1  123  17  415  71  283  38 

For  each  set,  calculate  the  coefficient  of  correlation  between  X  and  X2,  then 
between  x  and  x2.  Also  calculate  the  coefficients  of  correlation  between  X  and  X 3 
and  between  x  and  x3.  What  generalizations  are  suggested  by  your  results? 


326  /  Polynomial  regression 


9.20.  (Calculus  needed.)  Refer  to  the  second-order  response  function  (9.2).  Explain 
precisely  the  meaning  of  the  linear  effect  coefficient  /3j  and  the  quadratic  effect 
coefficient  /3i , . 

9.21.  a.  Derive  the  expressions  for  bo,  b\,  and  b'n  in  (9.15). 

b.  Using  theorem  (6.49),  obtain  the  variance-covariance  matrix  for  the  regres¬ 
sion  coefficients  pertaining  to  the  original  X  variable  in  terms  of  the  vari¬ 
ance-covariance  matrix  for  the  regression  coefficients  pertaining  to  the  trans¬ 
formed  x  variable. 

9.22.  How  are  the  normal  equations  (9. 12)  simplified  if  the  X  values  are  equally  spaced, 
such  as  the  time  series  representation  Xy  =  1,  X2  =  2 , ...  ,Xn  =  ril 


PROJECTS 

9.23.  Refer  to  the  SMSA  data  set.  It  is  desired  to  fit  the  second-order  regression  model 
(9.1a)  for  relating  number  of  active  physicians  (F)  against  total  population  (X). 

a.  Fit  the  second-order  regression  model.  Plot  the  residuals  against  the  fitted 
values.  How  well  does  the  second-order  model  appear  to  fit  the  data? 

b.  Obtain  R 2  for  the  second-order  regression  model.  Also  obtain  the  coefficient 
of  simple  determination  r 2  for  the  first-order  regression  model.  Has  the  addi¬ 
tion  of  the  quadratic  term  in  the  regression  model  substantially  increased  the 
coefficient  of  determination? 

c.  Test  whether  the  quadratic  term  can  be  dropped  from  the  regression  model; 
use  a  =  ,05.  State  the  alternatives,  decision  rule,  and  conclusion. 

d.  Omit  observation  1  (New  York  City)  from  the  data  set.  Fit  the  second-order 
regression  model  (9.1a)  based  on  the  remaining  140  SMSA’s.  Repeat  the  test 
in  part  (c).  Has  the  omission  of  the  outlying  observation  affected  your  conclu¬ 
sion  about  whether  the  quadratic  term  can  be  dropped  from  the  model? 

9.24.  Refer  to  the  SMSA  data  set.  A  regression  model  relating  serious  crime  rate  (F, 
total  serious  crimes  divided  by  total  population)  to  population  density  (A1;  total 
population  divided  by  land  area)  and  percent  of  population  in  central  cities  (A3)  is 
to  be  constructed. 

a.  Fit  the  second-order  regression  model  (9.5).  Plot  the  residuals  against  the 
fitted  values.  How  well  does  the  second-order  model  appear  to  fit  the  data? 
What  is  R2! 

b.  Test  whether  or  not  all  quadratic  and  interaction  terms  can  be  dropped  from 
the  model;  use  a  =  .01.  State  the  alternatives,  decision  rule,  and  conclusion. 

c.  Instead  of  using  the  independent  variable  population  density,  total  population 
(Xi)  and  land  area  (X2)  are  to  be  employed  as  separate  independent  variables, 
in  addition  to  percent  of  population  in  central  cities  (A3).  The  regression 
model  should  contain  linear  and  quadratic  terms  for  total  population,  and 
linear  terms  only  for  land  area  and  percent  of  population  in  central  cities.  (No 
interaction  terms  are  to  be  included  in  this  model.)  Fit  this  regression  model 
and  obtain  R2.  Is  this  coefficient  of  multiple  determination  substantially  dif¬ 
ferent  from  the  one  for  the  model  in  part  (a)? 

9.25.  Refer  to  the  SENIC  data  set.  The  second-order  regression  model  (9.1a)  is  to  be 
fitted  for  relating  number  of  nurses  (F)  to  available  facilities  and  services  (A). 


Projects  /  327 


a.  Fit  the  second-order  regression  model.  Plot  the  residuals  against  the  fitted 
values.  How  well  does  the  second-order  model  appear  to  fit  the  data? 

b.  Obtain  R2  for  the  second-order  regression  model.  Also  obtain  the  coefficient 
of  simple  determination  r2  for  the  first-order  regression  model.  Has  the  addi¬ 
tion  of  the  quadratic  term  in  the  regression  model  substantially  increased  the 
coefficient  of  determination? 

c.  Test  whether  the  quadratic  term  can  be  dropped  from  the  regression  model; 
use  a  —  .10.  State  the  alternatives,  decision  rule,  and  conclusion. 

9.26.  Refer  to  Sales  forecasting  Problem  9.11.  Instead  of  using  a  polynomial  regres¬ 
sion  model  here,  it  has  been  suggested  that  a  transformation  of  variables  might 
yield  an  equally  good  fit  and  would  be  more  desirable  since  forecasting  requires 
extrapolation. 

a.  Fit  a  linear  regression  model  for  relating  Y'  =  VFagainst  X.  Plot  the  fitted 
regression  function  and  the  transformed  data.  How  effective  does  the  use  of 
the  transformed  variable  appear  to  be  here? 

b.  Obtain  the  fitted  values  and  transform  them  to  the  original  variable  Y.  Then 
calculate  the  residuals  in  the  original  variable.  Plot  these  residuals  against  X 
and  analyze  your  plot. 

c.  Square  the  residuals  in  the  original  variable  obtained  in  part  (b),  sum,  and 
obtain  MSE.  Compare  this  with  MSE  for  the  quadratic  model  in  Problem 
9.11a.  How  does  the  variability  around  the  fitted  regression  function,  as 
measured  by  MSE,  compare  for  the  two  approaches? 

d.  Repeat  parts  (a)  through  (c)  using  the  transformation  Y'  =  log10  Y.  Is  either 
the  square-root  or  the  logarithmic  transformation  clearly  preferable  here? 


CITED  REFERENCES 

9.1  Williams,  E.  J.  Regression  Analysis.  New  York:  John  Wiley  &  Sons,  1959. 

9.2  Draper,  N.  R.,  and  H.  Smith.  Applied  Regression  Analysis .  2d  ed.  New  York:  John 
Wiley  &  Sons,  1981. 


10 


Indicator  variables 


Throughout  the  previous  chapters  on  regression  analysis,  we  have  utilized 
quantitative  variables  in  the  regression  models  considered.  Quantitative  variables 
take  on  values  on  a  well-defined  scale;  examples  are  income,  age,  temperature, 
and  amounts  of  liquid  assets. 

Many  variables  of  interest  in  business,  economics,  and  the  social  and  biologi¬ 
cal  sciences,  however,  are  not  quantitative  but  are  qualitative.  Examples  of  qual¬ 
itative  variables  are  sex  (male,  female),  purchase  status  (purchase,  no  purchase), 
and  disability  status  (not  disabled,  partly  disabled,  fully  disabled). 

Qualitative  variables  can  be  used  in  a  multiple  regression  model  just  as  quanti¬ 
tative  variables  can,  as  we  shall  explain  in  this  chapter.  First,  we  take  up  the  case 
where  some  or  all  of  the  independent  variables  are  qualitative.  Then  we  turn  to 
the  case  where  the  dependent  variable  is  qualitative. 


10.1  ONE  INDEPENDENT  QUALITATIVE  VARIABLE 

An  economist  wished  to  relate  the  speed  with  which  a  particular  insurance 
innovation  is  adopted  (7)  to  the  size  of  the  insurance  firm  (XQ  and  the  type  of 
firm.  The  dependent  variable  is  measured  by  the  number  of  months  elapsed 

328 


10.1  One  independent  qualitative  variable  /  329 


between  the  time  the  first  firm  adopted  the  innovation  and  the  time  the  given  firm 
adopted  the  innovation.  The  first  independent  variable,  size  of  firm,  is  quantita¬ 
tive,  and  is  measured  by  the  amount  of  total  assets  of  the  firm.  The  second 
independent  variable,  type  of  firm,  is  qualitative  and  is  composed  of  two 
classes — stock  companies  and  mutual  companies.  In  order  that  such  a  qualitative 
variable  can  be  used  in  a  regression  model,  quantitative  indicators  for  the  classes 
of  the  qualitative  variable  must  be  found. 


Indicator  variables 


There  are  many  ways  of  quantitatively  identifying  the  classes  of  a  qualitative 
variable.  We  shall  use  indicator  variables  that  take  on  the  values  0  and  1 .  These 
indicator  variables  are  easy  to  use  and  are  widely  employed,  but  they  are  by  no 
means  the  only  way  to  quantify  a  qualitative  variable. 

For  our  example,  where  the  qualitative  variable  has  two  classes,  we  might 
define  two  indicator  variables  X2  and  X3  as  follows: 


(10.1) 


1  if  stock  company 
0  otherwise 
1  if  mutual  company 
0  otherwise 


Assuming  that  a  first-order  model  is  to  be  employed,  it  would  be: 
(10.2)  Yj  =  PoXi0  +  (3\Xj]  +  P2X12  +  fi3Xi3  +  e,  where  X;o  =  1 


This  intuitive  approach  of  setting  up  an  indicator  variable  for  each  class  of  the 
qualitative  variable  unfortunately  leads  to  computational  difficulties.  To  see  why, 
suppose  we  have  n  =  4  observations,  the  first  two  being  stock  firms  for  which 
X2  =  1  and  X3  =  0,  and  the  second  two  being  mutual  firms  for  which  X2  =  0  and 
X3  =  1.  The  X  matrix  would  then  be: 

~Xq  X\  %2  %3 

”1  Xu  1  0" 

x=  1  X21  1  0 

1  x31  0  1 

1  X41  0  1 

Note  that  the  X0  column  is  equal  to  the  sum  of  the  X2  and  X3  columns,  so  that  the 
columns  are  linearly  dependent  according  to  definition  (6.22).  This  has  a  serious 
effect  on  the  X'X  matrix: 


■  1 

1 

1 

1  ■ 

1 

Xll 

1 

o~ 

*11 

X21 

X31 

X41 

1 

X21 

1 

0 

1 

1 

0 

0 

1 

X31 

0 

1 

0 

0 

1 

1  _ 

1 

X41 

0 

1 

XX  = 


330  /  Indicator  variables 


/=  l 

2 

i=  1 

i*,. 

i=i 

Ex'i 

1=1 

2 

SXi 

i  =  3 

0 

first  column  of 

0 


of  the  last  two  columns,  so  that  the  columns  are  linearly  dependent.  Hence,  the 
X'X  matrix  does  not  have  an  inverse,  and  no  unique  estimators  of  the  regression 
coefficients  can  be  found. 

A  simple  way  out  of  this  difficulty  is  to  drop  one  of  the  indicator  variables.  In 
our  example,  for  instance,  we  might  drop  X3.  While  dropping  one  indicator 
variable  is  not  the  only  way  out  of  the  difficulty,  it  leads  to  simple  interpretations 
of  the  parameters.  In  general,  therefore,  we  shall  follow  the  principle: 

(10.3)  A  qualitative  variable  with  c  classes  will  be  represented  by  c  —  1 
indicator  variables,  each  taking  on  the  values  0  and  1. 


Note 

Indicator  variables  are  frequently  also  called  dummy  variables  or  binary  variables. 
The  latter  term  has  reference  to  the  binary  number  system  containing  only  0  and  1. 


Interpretation  of  regression  parameters 

Returning  to  our  example,  suppose  that  we  drop  the  indicator  variable  X3  from 
model  (10.2)  so  that  the  model  becomes: 

(10.4)  Yt  =  fa  +  fcXn  +  faXi2  +  e, 

where: 


Xn  =  Size  of  firm 

X  _  1  if  stock  company 
12  0  otherwise 

The  response  function  for  this  model  is: 

(10.5)  E(Y)  =  (B0  +  faX,  +  faX2 

To  understand  the  meaning  of  the  parameters  of  this  model,  consider  first  the 
case  of  a  mutual  firm.  For  such  a  firm,  X2  =  0  and  we  have: 


(10.5a) 


E(Y )  =  (B0  +  fax  i  +  02(0)  =  (Bo  +  faX, 


Mutual  firms 


10.1  One  independent  qualitative  variable  /  331 


Thus,  the  response  function  for  mutual  firms  is  a  straight  line,  with  Y  intercept  j80 
and  slope  )8i .  This  response  function  is  shown  in  Figure  10.1. 

For  a  stock  firm,  X2  —  1  and  the  response  function  (10.5)  is: 

(10.5b)  E(Y)  =  /30  +  faXj  +/32(1)  =  (/30  +  /32)  +  (3^  Stock  firms 

This  also  is  a  straight  line,  with  the  same  slope  j8i  but  with  Y  intercept  j80  +  jS2. 
This  response  function  is  also  shown  in  Figure  10.1. 

The  meaning  of  the  parameters  in  the  response  function  (10.5)  is  now  clear. 
With  reference  to  our  earlier  example,  the  mean  time  elapsed  before  the  innova¬ 
tion  is  adopted,  E(Y),  is  a  linear  function  of  size  of  firm  {X\ ) ,  with  the  same 
slope  /Si  for  both  types  of  firms.  j32  indicates  how  much  higher  (lower)  the 
response  function  for  stock  firms  is  than  the  one  for  mutual  firms.  Thus,  j82 
measures  the  differential  effect  of  type  of  firm.  In  general,  j32  shows  how  much 
higher  (lower)  the  mean  response  line  is  for  the  class  coded  1  than  the  line  for  the 
class  coded  0. 

FIGURE  10.1  Illustration  of  meaning  of  regression  parameters  for  model  (10.4)  with 
indicator  variable  X2 — insurance  innovation  example 


Number  of 
Months  Elapsed 


Size  of  Firm 


332  /  Indicator  variables 


Example 

With  reference  to  our  earlier  illustration,  the  economist  studied  10  mutual 
firms  and  10  stock  firms.  The  data  are  shown  in  Table  10.1.  The  Y  and  X  data 
matrices  are  shown  in  Table  10.2.  Note  that  X2=  1  for  each  stock  firm  and 
X2  =  0  for  each  mutual  firm. 

Given  the  Y  and  X  matrices  in  Table  10.2,  fitting  the  regression  model  (10.4) 
is  straightforward.  Table  10.3  presents  the  key  results  from  a  computer  run.  The 
fitted  response  function  is: 

Y  =  33.87407  -  .  10174*!  +  8.05547X2 

Figure  10.2  (p.  334)  contains  the  fitted  response  function  for  each  type  of  firm, 
together  with  the  actual  observations. 

The  economist  was  most  interested  in  the  effect  of  type  of  firm  (X2)  on  the 
elapsed  time  for  the  innovation  to  be  adopted.  He  therefore  desired  to  obtain  a  95 
percent  confidence  interval  for  (32 .  We  require  f(.975;  17)  =  2.110  and  obtain 
from  the  data  in  Table  10.3: 

4.97675  =  8.05547  -  2.110(1.45911)  <  jS2 

<  8.05547  +  2.110(1.45911)  =  11.13419 

Thus,  with  95  percent  confidence,  we  conclude  that  stock  companies  tend  to 
adopt  the  particular  innovation  studied  somewhere  between  5  and  11  months 
later,  on  the  average,  than  mutual  companies. 

TABLE  10.1  Data  for  insurance  innovation  study 


Firm 

i 

Number  of 
Months  Elapsed 

Yi 

Size  of  Firm 
(million  dollars) 

Xn 

Type  of 
Firm 

1 

17 

151 

Mutual 

2 

26 

92 

Mutual 

3 

21 

175 

Mutual 

4 

30 

31 

Mutual 

5 

22 

104 

Mutual 

6 

0 

277 

Mutual 

7 

12 

210 

Mutual 

8 

19 

120 

Mutual 

9 

4 

290 

Mutual 

10 

16 

238 

Mutual 

11 

28 

164 

Stock 

12 

15 

272 

Stock 

13 

11 

295 

Stock 

14 

38 

68 

Stock 

15 

31 

85 

Stock 

16 

21 

224 

Stock 

17 

20 

166 

Stock 

18 

13 

305 

Stock 

19 

30 

124 

Stock 

20 

14 

246 

Stock 

10.1  One  independent  qualitative  variable  /  333 


TABLE  10.2  Data  matrices  for  insurance  innovation  study 


2T0 

X, 

X2 

"17“ 

"l 

151 

0 

26 

1 

92 

0 

21 

1 

175 

0 

30 

1 

31 

0 

22 

1 

104 

0 

0 

1 

277 

0 

12 

1 

210 

0 

19 

1 

120 

0 

4 

1 

290 

0 

16 

X  = 

1 

238 

0 

28 

1 

164 

1 

15 

1 

272 

1 

11 

1 

295 

1 

38 

1 

68 

1 

31 

1 

85 

1 

21 

1 

224 

1 

20 

1 

166 

1 

13 

1 

305 

1 

30 

1 

124 

1 

_  14  _ 

_1 

246 

1  _ 

TABLE  10.3  Regression  results  for  model  (10.4)  fit — insurance  innovation 
example 


(a)  Regression  Coefficients 

Regression  Estimated  Estimated 

Coefficient  Regression  Coefficient  Standard  Deviation  t* 


Po  33.87407  1.81386  18.68 

^  -.10174  .00889  -  11.44 

f32  8.05547  1.45911  5.52 

(b)  Analysis  of  Variance 
Source  of 

Variation  SS  df  MS 


Regression  1,504.41  2  752.20 

Error  176.39  17  10.38 


Total 


1,680.80  19 


Indicator  variables 


A  formal  test  of: 


H0:  P2  =  0 
Ha:  &  ^  0 


with  level  of  significance  .05  would  lead  to  Ha,  that  type  of  firm  has  an  effect, 
since  the  confidence  interval  for  (B2  does  not  include  zero. 


FIGURE  10.2  Fitted  regression  functions  for  model  (10.4)— insurance  innovation 
example 


Number  of 
Months  Elapsed 


Size  of  Firm 


10.2  Model  containing  interaction  effects  /  335 


The  economist  also  carried  out  other  analyses,  some  of  which  will  be  de¬ 
scribed  shortly. 

Note 

The  reader  may  wonder  why  we  did  not  simply  fit  separate  regressions  for  stock  firms 
and  mutual  firms  in  our  example,  and  instead  adopted  the  approach  of  fitting  one  regres¬ 
sion  with  an  indicator  variable.  There  are  two  reasons  for  this.  Since  the  model  assumed 
equal  slopes  and  the  same  constant  error  term  variance  for  each  type  of  firm,  the  common 
slope  Pi  can  best  be  estimated  by  pooling  the  two  types  of  firms.  Also,  other  inferences, 
such  as  ones  pertaining  to  fio  and  /32,  can  be  made  more  precisely  by  working  with  one 
regression  model  containing  an  indicator  variable  since  more  degrees  of  freedom  will  then 
be  associated  with  MSE. 

10.2  MODEL  CONTAINING  INTERACTION  EFFECTS 

In  our  earlier  illustration,  the  economist  actually  did  not  begin  his  analysis 
with  model  (10.4)  since  he  expected  interaction  effects  between  size  and  type  of 
firm.  Even  though  one  of  the  independent  variables  in  the  regression  model  is 
qualitative,  interaction  effects  are  introduced  into  the  model  in  the  usual  manner, 
by  including  cross-product  terms.  A  first-order  model  with  an  interaction  term 
for  our  example  is: 

(10.6)  Yt  =  (30  +  PAi  +  p2Xi2  +  P3XnXi2  +  e, 

where: 


Xn  =  Size  of  firm 
y  _  1  if  stock  company 
'2  0  otherwise 

The  response  function  for  this  model  is: 

(10.7)  E(Y)  =  (30  +  PA  +  P2X2  +  PAX2 

Meaning  of  regression  parameters 

The  meaning  of  the  regression  parameters  in  the  response  function  (10.7)  can 
best  be  understood  by  examining  the  nature  of  this  function  for  each  type  of  firm. 
For  a  mutual  firm,  X2  =  0  and  hence  X\X2  =  0.  The  response  function  for  mu¬ 
tual  firms  therefore  is: 

(10.7a)  E(Y)  =  0O  +  PA  +  p2(0)  +  p3( 0)  =  P0  +  PA  Mutual  firms 

This  response  function  is  shown  in  Figure  10.3.  Note  that  the  Y  intercept  is  (30 
and  the  slope  is  Pi  for  the  response  function  for  mutual  firms. 

For  stock  firms,  X2  =  1  and  hence  X\X2  =  Xx.  The  response  function  for 
stock  firms  therefore  is: 

E(Y)  =  P0  +  PA  +  02(1)  +  P3X1 

or: 

(10.7b)  E(Y)  =  (fa  +  p2)  +  (pi  +  p3)Xi  Stock  firms 


336  /  Indicator  variables 


FIGURE  10.3  Illustration  of  meaning  of  regression  parameters  for  model  (10.6)  with 

indicator  variable  X2  and  interaction  term — insurance  innovation  example 


Number  of 
Months  Elapsed 


This  response  function  is  also  shown  in  Figure  10.3.  Note  that  the  response 
function  for  stock  firms  has  Y  intercept  /30  +  (32  and  slope  /3,  +  /33. 

Thus,  /32  indicates  how  much  greater  (smaller)  is  the  Y  intercept  for  the  class 
coded  1  than  that  for  the  class  coded  0,  and  similarly  fi3  indicates  how  much 
greater  (smaller)  is  the  slope  for  the  class  coded  1  than  that  for  the  class  coded  0. 
Because  both  the  intercept  and  the  slope  differ  for  the  two  classes  in  model 
(10.6),  it  is  no  longer  true  that  (3Z  indicates  how  much  higher  (lower)  one  re¬ 
sponse  line  is  than  the  other.  Figure  10.3  makes  it  clear  that  the  effect  of  type  of 
firm  with  model  (10.6)  depends  on  the  size  of  the  firm.  For  smaller  firms, 
according  to  Figure  10.3,  mutual  firms  tend  to  innovate  more  quickly,  but  for 
larger  firms  stock  firms  tend  to  innovate  more  quickly.  Thus,  when  interaction 
effects  are  present,  the  effect  of  the  qualitative  variable  can  only  be  studied  by 
comparing  the  regression  functions  for  each  class  of  the  qualitative  variable. 


10.2  Model  containing  interaction  effects  /  337 


FIGURE  10.4  Another  illustration  of  model  (10.6)  with  indicator  variable  X2  and 
interaction  term — insurance  innovation  example 


Number  of 


Figure  10.4  illustrates  another  possible  interaction  situation.  Here,  mutual  firms 
tend  to  introduce  the  innovation  more  quickly  than  stock  firms  for  all  sizes  of 
firms  in  the  scope  of  the  model,  but  the  differential  effect  is  much  smaller  for 
large  firms  than  for  small  ones. 


Example 

Since  the  economist  anticipated  that  interaction  effects  between  size  and  type 
of  firm  may  be  present,  he  actually  first  wished  to  fit  model  (10.6): 

Yi  =  A,  +'  Xn  +  P2Xi2  +  fcXnXn  +  e, 

Table  10.4  shows  the  X  matrix  for  this  model.  The  Y  matrix  is  the  same  as  in 
Table  10.2.  Note  that  the  XiX2  column  in  the  X  matrix  in  Table  10.4  contains  0 
for  mutual  companies  and  Xn  for  stock  companies. 


338  /  Indicator  variables 


TABLE  10.4  X  matrix  for  fitting  model  (10.6)  with 
interaction  term — insurance  innovation 
example 


A0 

XiX2 

1 

151 

0 

0 

1 

92 

0 

0 

1 

175 

0 

0 

1 

31 

0 

0 

1 

104 

0 

0 

1 

277 

0 

0 

1 

210 

0 

0 

1 

120 

0 

0 

1 

290 

0 

0 

1 

238 

0 

0 

1 

164 

1 

164 

1 

272 

1 

272 

1 

295 

1 

295 

1 

68 

1 

68 

1 

85 

1 

85 

1 

224 

1 

224 

1 

166 

1 

166 

1 

305 

1 

305 

1 

124 

1 

124 

_1 

246 

1 

246  _ 

Given  the  Y  and  X  matrices,  the  regression  fit  is  routine.  Basic  results  from  a 
computer  run  are  shown  in  Table  10.5.  To  test  for  the  presence  of  interaction 
effects: 


H0:  ft  =  0 
Ha:  ft  ^  0 

the  economist  used  the  t*  statistic  from  Table  10.5a: 

b3  -  .0004171 

t*  =  — —  = - =  -.02 

s(b3)  .01833 

For  level  of  significance  .05,  we  require  t(.975;  16)  =  2.120.  Since  1 1*  |  =  .02 
^  2. 120,  we  conclude  H0,  i.e. ,  /33  =  0,  so  that  no  interaction  effects  are  present. 
The  two-sided  P- value  for  the  test  is  very  high,  namely,  .98.  It  was  because  of 
this  result  that  the  economist  adopted  model  (10.4)  with  no  interaction  term, 
which  we  discussed  earlier. 


Note 

Fitting  model  (10.6)  yields  the  same  response  functions  as  fitting  separate  regressions 
for  stock  firms  and  mutual  firms.  An  advantage  of  using  model  (10.6)  with  an  indicator 


10.3  More  complex  models  /  339 


TABLE  10.5  Regression  results  for  fit  of  model  (10.6)  with  interaction  term — 
insurance  innovation  example 


(a)  Regression  Coefficients 

Regression  Estimated  Estimated 

Coefficient  Regression  Coefficient  Standard  Deviation  t* 


0o 

02 

03 


33.83837 

-  .10153 
8.13125 

-  .0004171 


2.44065  13.86 

.01305  -  7.78 

3.65405  2.23 

.01833  -.02 


(b)  Analysis  of  Variance 
Source  of 

Variation  SS  df  MS 


Regression  1,504.42  3  501.47 

Error  176.38  16  11.02 


Total  1,680.80  19 


variable  is  that  one  regression  run  on  the  computer  will  yield  both  fitted  regressions. 

Another  advantage  is  that  tests  for  comparing  the  regression  functions  for  the  different 
classes  of  the  qualitative  variable  can  be  clearly  seen  to  be  tests  of  regression  coefficients 
in  a  general  linear  model.  For  instance,  Figure  10.3  makes  it  clear  for  our  example  that  the 
test  whether  the  two  regression  functions  have  the  same  slope  involves: 

H0:  ft  =  0 

Ha :  03  ^  0 

Similarly,  the  test  whether  the  two  regression  functions  in  our  example  are  identical  would 
involve: 


Ho-  02  —  03  ~  0 

Ha:  not  both  02  =  0  and  03  =  0 


10.3  MORE  COMPLEX  MODELS 

We  now  briefly  consider  more  complex  models  involving  qualitative  inde¬ 
pendent  variables. 

Qualitative  variable  with  more  than  two  classes 

If  a  qualitative  independent  variable  has  more  than  two  classes,  we  require 
additional  indicator  variables  in  the  regression  model.  Consider  the  regression  of 
tool  wear  (7)  on  tool  speed  (Xj),  where  we  wish  to  include  also  tool  model  (Ml, 
M2,  M3,  M4)  as  an  independent  variable.  Since  the  qualitative  variable  (tool 
model)  has  four  classes,  we  require  three  indicator  variables.  Let  us  define  them 
as  follows: 


340  /  Indicator  variables 


(10.8) 


_  1  if  tool  model  Ml 

2  0  otherwise 

_  1  if  tool  model  M2 

3  0  otherwise 

_  1  if  tool  model  M3 

4  0  otherwise 


First-order  model.  A  first-order  model  is: 

(10.9)  Yi  =  ($o  +  PiXn  +  (B2Xi2  +  (B3Xi3  +  (34Xi4  +  £i 

For  this  model,  the  data  input  for  the  X  matrix  would  be  as  follows: 

Tool  Model  XQ  Xl  X2  X3  X4 

Ml  1  X,i  1  0  0 

M2  1  Xn  0  1  0 

M3  1  Xn  0  0  1 

M4  1  Xn  0  0  0 

The  response  function  for  model  (10.9)  is: 

(10.10)  E(Y)  =  (B0  +  ISA  +  (B2X2  +  P3X3  +  1S4X4 

To  see  the  meaning  of  the  regression  parameters,  consider  first  the  response 
function  for  tool  models  M4  for  which  X2  =  0,  X3  =  0,  and  X4  =  0: 

(10.10a)  E(Y)  =  (B0  +  (BxXx  Tool  models  M4 

For  tool  models  Ml,  X2  =  1,  X3  =  0,  and  X4  =  0,  and  the  response  function  is: 

(10.10b)  E(Y)  =  /30  +  (BxXx  +  (B2  =  ((B0  +  (B2)  +  (BxXx  Tool  models  Ml 

Similarly,  the  response  functions  for  tool  models  M2  and  M3  are: 

(10.10c)  E(Y)  =  ((B0  +  (B3)  +  (BxXx  Tool  models  M2 

(lO.lOd)  E(Y)  =  ((B0  +  (B4)  +  (BxXx  Tool  models  M3 

Thus,  the  response  function  (10. 10)  implies  that  the  regression  of  tool  wear  on 
tool  speed  is  linear,  with  the  same  slope  for  all  types  of  tool  models.  The  coeffi¬ 
cients  (B2,  (B3,  and  (B4  indicate,  respectively,  how  much  higher  (lower)  the  re¬ 
sponse  functions  for  tool  models  Ml,  M2,  and  M3  are  than  the  one  for  tool 
models  M4.  Thus,  (B2,  (B3,  and  (B4  measure  the  differential  effects  of  the  qualita¬ 
tive  variable  classes  on  the  height  of  the  response  function,  always  compared 
with  the  class  for  which  X2  =  X3  =  X4  =  0.  Figure  10.5  illustrates  a  possible 
arrangement  of  the  response  functions. 

When  using  model  (10.9),  one  may  wish  to  estimate  differential  effects  other 
than  against  tool  models  M4.  For  instance,  (B4  —  (B3  measures  how  much  higher 
(lower)  the  response  function  for  tool  models  M3  is  than  the  response  function 
for  tool  models  M2,  as  may  be  seen  by  comparing  (10.10c)  and  (10.  lOd).  The 
point  estimator  of  this  quantity  is,  of  course,  b4  —  b3,  and  the  estimated  variance 


10.3  More  complex  models  /  341 


FIGURE  10.5  Illustration  of  model  (10.9) — tool  wear  example 
Tool  Wear 


of  this  estimator  is: 

(10.11)  s2(b4  —  b3 )  =  s2(b4 )  +  s2(b3)  —  2 s(b4,  b3 ) 

The  needed  variances  and  covariance  can  be  readily  obtained  from  the  estimated 
variance-covariance  matrix  of  the  regression  coefficients. 

First-order  model  with  interactions  added.  If  interaction  effects  between 
tool  speed  and  tool  model  are  present  in  our  previous  illustration,  model  (10.9) 
would  be  modified  as  follows: 

(10.12)  Yt  =  (3  o  +  jSjXn  +  (32Xi2  +  /33Xi3  +  (34Xi4  +  (35XnXi2 

+  Xi3  +  f32Xi\X-l4  +  e, 

The  response  function  for  tool  models  M4,  for  which  X2  =  0,  X3  =  0,  and 
X4  =  0,  is  as  follows: 


342  /  Indicator  variables 


(10.13a) 

E(Y)  =  (30  +  /3A 

Tool  models  M4 

Similarly, 

we  find  for  the  other  tool  models: 

(10.13b) 

E(Y)  =  (ft,  +  182)  +  (0i  +  0s)*! 

Tool  models  Ml 

(10.13c) 

E{Y)  =  (00  +  03)  +  (01  +  06)*! 

Tool  models  M2 

(10.13d) 

E(Y)  =  (00  +  04)  +  (01  +  07)*! 

Tool  models  M3 

Thus,  the  interaction  model  (10.12)  implies  that  each  tool  model  has  its  own 
regression  line,  with  different  intercepts  and  slopes  for  the  different  tool  models. 


More  than  one  qualitative  independent  variable 

Models  can  readily  be  constructed  for  cases  where  two  or  more  of  the  inde¬ 
pendent  variables  are  qualitative.  Consider  the  regression  of  advertising  ex¬ 
penditures  (T)  on  sales  (X, ),  type  of  firm  (incorporated,  not  incorporated),  and 
quality  of  sales  management  (high,  low).  We  may  define: 


(10.14) 


1  if  firm  incorporated 
0  otherwise 

1  if  quality  of  sales  management  high 
0  otherwise 


First-order  model.  A  first-order  model  for  the  above  example  is: 

(10.15)  Yi  =  0O  +  j3i  Xn  +  p2Xi2  +  p3Xi3  +  Si 

This  model  implies  that  the  response  function  of  advertising  expenditures  on 
sales  is  linear,  with  the  same  slope  for  all  “type  of  firm-quality  of  sales  manage¬ 
ment”  combinations,  and  02  and  03  indicate  the  additive  differential  effects  of 
type  of  firm  and  quality  of  sales  management  on  the  height  of  the  regression  line. 

First-order  model  with  certain  interactions  added.  A  first-order  model 
with  interaction  effects  between  pairs  of  the  independent  variables  added  is: 

(10.16)  Yi  =  Po  +  jSiX,-!  +  PzXn  +  fcXis  +  PaXhXv 

+  PsXnXn  +  fi(yXi2Xj3  +  e, 

Note  the  implications  of  this  model: 

Quality  of 

Type  of  Sales 

Firm  Management  Response  Function 


Incorp- 

High 

E(Y)  = 

Not  incorp. 

High 

E(Y)  = 

Incorp- 

Low 

E(Y)  = 

Not  incorp- 

Low 

E(Y)  ~ 

(A  +  A  +  As  +  AO  +  (A  +  A  +  A)W 
(A)  +  A)  +  (A  +  A)W 
(A  +  A)  +  (A  +  A)W 
A  +  AW 


Not  only  are  all  response  functions  different  for  the  various  “type  of  firm- 
quality  of  sales  management”  combinations,  but  the  differential  effects  of  one 


10.4  Other  uses  of  independent  indicator  variables  /  343 


qualitative  variable  on  tlie  intercept  depend  on  the  particular  class  of  the  other 
qualitative  variable.  For  instance,  when  we  move  from  “not  incorporated-low 
quality”  to  “incorporated-low  quality,”  the  intercept  changes  by  /32.  But  if  we 
move  from  “not  incorporated-high  quality”  to  “incorporated-high  quality,” 
the  intercept  changes  by  /32  +  /36. 

Qualitative  independent  variables  only 

Regression  models  containing  only  qualitative  independent  variables  can  also 
be  constructed.  With  reference  to  our  previous  example,  we  could  regress  adver¬ 
tising  expenditures  only  on  type  of  firm  and  quality  of  sales  management.  The 
first-order  model  then  would  be: 

(10. 17)  Yt  =  j60  +  P2Xi2  +  P3Xi3  +  et 

where  Xi2  and  Xi3  are  defined  in  (10.14). 

Comments 

1.  Models  in  which  all  independent  variables  are  qualitative  are  called  analysis  of 
variance  models. 

2.  Models  containing  some  quantitative  and  some  qualitative  independent  variables, 
where  the  chief  independent  variables  of  interest  are  qualitative  and  the  quantitative  inde¬ 
pendent  variables  are  introduced  primarily  to  reduce  the  variance  of  the  error  terms,  are 
called  covariance  models. 


10.4  OTHER  USES  OF  INDEPENDENT  INDICATOR  VARIABLES 

Comparison  of  two  or  more  regression  functions 

Frequently  we  encounter  regressions  for  two  or  more  populations  and  wish  to 
examine  their  similarities  and  differences.  We  present  two  examples. 

1 .  An  economist  is  studying  the  relation  between  amount  of  savings  and  level 
of  income  for  middle-income  families  from  urban  and  rural  areas,  based  on 
independent  samples  from  the  two  populations.  Each  of  the  two  relations  can  be 
modeled  by  linear  regression.  She  wishes  to  compare  whether,  at  given  income 
levels,  urban  and  rural  families  tend  to  save  the  same  amount — i.e.,  whether  the 
two  regression  lines  are  the  same.  If  they  are  not,  she  wishes  to  explore  whether 
at  least  the  amounts  of  savings  out  of  an  additional  dollar  of  income  are  the  same 
for  the  two  groups — i.e.,  whether  the  slopes  of  the  two  regression  lines  are  the 
same. 

2.  A  company  has  two  instruments  constructed  to  identical  specifications  to 
measure  pressure  in  an  industrial  process.  A  study  has  been  made  for  each  instru¬ 
ment  of  the  relation  between  its  gauge  readings  and  actual  pressures  as  deter¬ 
mined  by  an  almost  exact  but  slow  and  costly  method.  If  the  two  regression  lines 
are  the  same,  a  single  calibration  schedule  can  be  developed  for  the  two  instru¬ 
ments;  otherwise,  two  different  calibration  schedules  would  be  required. 


344  /  Indicator  variables 


When  it  is  reasonable  to  assume  that  the  error  term  variances  in  the  regression 
models  for  the  different  populations  are  equal,  use  of  indicator  variables  permits 
us  to  test  the  equality  of  the  different  regression  functions.  If  the  error  variances 
are  not  equal,  transformations  may  equalize  them  at  least  approximately. 

We  have  already  seen  that  regression  models  with  indicator  variables  that 
contain  interaction  terms  permit  testing  of  the  equality  of  regression  functions  for 
the  different  classes  of  a  qualitative  variable.  This  methodology  can  be  used 
directly  for  testing  the  equality  of  regression  functions  for  different  populations. 
One  simply  considers  the  different  populations  studied  as  classes  of  an  independ¬ 
ent  variable,  defines  indicator  variables  for  the  different  populations,  and  devel¬ 
ops  a  regression  model  containing  appropriate  interaction  terms.  Since  no  new 
principles  arise  in  the  testing  of  the  equality  of  regression  functions  for  different 
populations,  we  shall  utilize  the  two  earlier  examples  to  illustrate  the  approach. 

Example  1 — Savings  study.  In  the  savings  study  example,  the  economist 
was  willing  to  assume  that  the  regression  relation  between  savings  (7)  and  in¬ 
come  (X|)  for  middle-income  families  is  linear  for  both  rural  and  urban  families, 
and  that  the  error  term  variances  for  the  two  populations  are  the  same.  Since  she 
wished  to  test  whether  the  two  regression  lines  are  the  same,  she  fitted  model 
(10.6)  which  permits  both  the  slopes  and  the  intercepts  to  be  different  in  the  two 
regressions: 

(10.18)  Yt  =  0  o  +  frXn  +  fcXi2  +  (33XnXi2  +  e, 

where: 


Xn  =  Family  income 
_  1  if  urban  family 
ll  0  otherwise 

Identity  of  the  regression  functions  is  tested  by  considering  the  alternatives: 


(10.19) 


Ho'  02  ~  03  —  0 
Ha\  Not  both  ($2  =  0  and  /33  =  0 

The  appropriate  test  statistic  is  given  by  (8.32a): 

_  SSR&Z'X^X!)  SSE(X1,X2,X1X2) 


(10.19a) 


n  —  4 


where  n  represents  the  combined  sample  size  for  both  populations. 

If  the  economist  only  wished  to  test  whether  the  slopes  of  the  regression  lines 
are  the  same,  the  alternatives  would  be: 


(10.20) 


Ho.  03  =  o 
Ha.  ft  *  0 


and  the  appropriate  test  statistic  is  either  the  t*  statistic  (8.25)  or  the  partial  F  test 
statistic  (8.24): 


10.4  Other  uses  of  independent  indicator  variables  /  345 


SSR(X xX2  Xx,X2)  SSE(Xx ,  X2 ,  XxX2) 

(10.20a)  F*  = - v  -■  - - v  2;  1  2; 

1  n  -  4 

If  the  economist  wishes  to  estimate  the  difference  in  the  slopes  of  the  regres¬ 
sion  lines  for  urban  and  rural  families,  she  would  construct  a  confidence  interval 
for  /33  in  the  usual  fashion. 


Example  2 — Instrument  calibration  study.  The  engineer  making  the  cali¬ 
bration  study  believed  that  the  regression  functions  relating  gauge  reading  (T) 
and  actual  pressure  (Xj  for  both  instruments  are  second-order  polynomials: 

E(Y)  =  p0  +  frXi  +  jS2X? 

but  that  they  might  differ  for  the  two  instruments.  Hence,  he  employed  the  model 
(using  a  deviation  variable  for  Xx  to  reduce  multicollinearity  problems — see 
Chapter  9): 

(10.21)  Yt  =  (B0  +  PiXn  +  (32xn  +  PiXn  +  faxnXa  +  PsXnXi2  4-  et 
where: 


xn  =  Xtl  —  X !  =  deviation  of  actual  pressure 
_  1  if  instrument  B 
l2  0  otherwise 

Note  that  for  instrument  A,  where  X2  =  0,  the  response  function  is: 

(10.22a)  E(Y)  =  (30  +  plXl  +  $2x\ 

and  for  instrument  B,  where  X2  =  1,  the  response  function  is: 

(10.22b)  E(Y)  =  (fa  +  ft)  +  (ft  +  ft)x,  +  (02  +  05)x( 

Hence,  the  test  for  equality  of  the  two  response  functions  involves  the  alterna¬ 
tives: 


(10.23) 


H0:  P3  @4  /^5  0 

Ha:  Not  all  three  fik  =  0  (k  =  3,  4,  5) 

and  the  appropriate  test  statistic  is  (8.32a): 

„  SSR(X2,  x,X2,  xix2 \xi,4)  SSE(x1,xj,X2,x1X2,xU2) 

(10.23a)  F *  = - = - 

3  n  —  6 


where  n  represents  the  combined  sample  size  for  both  populations. 

Comments 


1 .  The  approach  just  described  is  completely  general.  If  three  or  more  populations  are 
involved,  additional  indicator  variables  would  simply  be  added  to  the  model. 

2.  The  use  of  indicator  variables  for  testing  whether  two  or  more  regression  functions 
are  the  same  is  equivalent  to  the  general  linear  test  approach  where  fitting  the  full  model 
involves  fitting  separate  regressions  to  the  data  from  each  population  and  fitting  the  re¬ 
duced  model  involves  fitting  one  regression  to  the  combined  data. 


346  /  Indicator  variables 


Piecewise  linear  regression 

Sometimes  the  regression  of  Y  on  X  follows  a  particular  linear  relation  in  some 
range  of  X,  but  follows  a  different  linear  relation  elsewhere.  For  instance,  unit 
cost  (Y)  regressed  on  lot  size  may  follow  a  certain  linear  regression  up  to  Xp  = 
500,  at  which  point  the  slope  changes  because  of  some  operating  efficiencies 
only  possible  with  lot  sizes  of  more  than  500.  Figure  10.6  illustrates  this  situa¬ 
tion. 

FIGURE  10.6  Illustration  of  piecewise  linear  regression 
Unit  Cost 


We  consider  now  how  indicator  variables  may  be  used  to  fit  piecewise  linear 
regressions  consisting  of  two  pieces.  We  take  up  the  case  where  Xp,  the  point 
where  the  slope  changes,  is  known. 

We  return  to  our  lot  size  illustration,  for  which  it  is  known  that  the  slope 
changes  at  Xp  =  500.  The  model  for  our  illustration  may  be  expressed  as  fol¬ 
lows: 

(10.24)  Y,  =  p0  +  faXn  +  f32(Xn  -  500)X;-2  +  ef 

where: 


Xn  =  Lot  size 

x  =  1  if  Xu  >  500 
12  0  otherwise 


10.4  Other  uses  of  independent  indicator  variables  /  347 


To  check  that  model  (10.24)  does  provide  a  two-piecewise  linear  regression, 
consider  the  response  function: 

(10.25)  £(7)  =  p0  +  (3.X,  +  p2(Xx  -  500)X2 

When  Xj  ^  500,  X2  =  0  so  that  (10.25)  becomes: 

(10.25a)  £(7)  =  p0  +  pxXx  Xx  <  500 

On  the  other  hand,  when  X]  >  500,  X2=  1  and  we  obtain: 

(10.25b)  E(Y)  =  (p0  -  500p2)  +  (ft  +  fa)Xi  Xx  >  500 

Thus,  Pi  and  Pi  +  p2  are  the  slopes  of  the  two  regression  lines,  and  p0  and 
(Po  —  500 p2)  are  the  two  7  intercepts.  These  parameters  are  shown  in  Figure 
10.7. 

Example.  Table  10.6a  contains  eight  observations  on  unit  costs  for  given  lot 
sizes.  It  is  known  that  the  response  function  slope  changes  at  Xp  =  500  so  that 


FIGURE  10.7  Illustration  of  parameters  of  piecewise  linear  regression 
model  (10.24) 

Unit  Cost 


Lot  Size 


348  /  Indicator  variables 


TABLE  10.6  Data  and  X  matrix  for  piecewise  linear  regression — lot  size 
example 


(a)  (b) 


Unit  Cost 

Lot 

Lot 

( dollars ) 

Size 

i 

Yi 

X, 

X0 

Xx 

(Xi  -  500) X2 

1 

2.57 

650 

‘1 

650 

150 

2 

4.40 

340 

1 

340 

0 

3 

4.52 

400 

1 

400 

0 

4 

1.39 

800  X  = 

1 

800 

300 

5 

4.75 

300 

1 

300 

0 

6 

3.55 

570 

1 

570 

70 

7 

2.49 

720 

1 

720 

220 

8 

3.77 

480 

_1 

480 

0 

model  (10.24)  is  to  be  employed.  Table  10.6b  contains  the  X  matrix  for  our 
example.  The  left  column  of  X  is  a  column  of  l’s  as  usual.  The  next  column 
contains  X;1-  The  final  column  on  the  right  contains  (Xn  —  500)X,-2,  which  con¬ 
sists  of  0’s  for  all  lot  sizes  up  to  500,  and  ofXn  —  500  for  all  lot  sizes  above  500. 
The  fitting  of  regression  model  (10.24)  at  this  point  becomes  routine.  The  fitted 
response  function  is: 

Y  =  5.89545  -  . 00395^  -  .  00389  (Xi  -  500)X2 

From  this  fitted  model,  expected  unit  cost  is  estimated  to  decline  by  .00395  for  a 
lot  size  increase  of  one  when  the  lot  size  is  less  than  500  and  by  .00395  + 
.00389  =  .00784  when  the  lot  size  is  500  or  more. 

Note 

The  extension  of  model  (10.24)  to  more  than  two-piecewise  regression  lines  is 
straightforward.  For  instance,  if  the  slope  of  the  regression  line  in  our  earlier  lot  size 
illustration  actually  changes  at  both  X  —  500  and  X  =  800,  the  model  would  be: 

(10.26)  Yi  =  fo  +  jSiXfi  +  fo&n  -  500)Xi2  +  /83(Xn  -  800)Xi3  +  e,- 

where: 


Xn  =  Lot  size 

=  1  if  Xn  >  500 
a  0  otherwise 
x  =  1  if  Xn  >  800 
'3  0  otherwise 

Discontinuity  in  regression  function 

Sometimes  the  linear  regression  function  may  not  only  change  its  slope  at 
some  value  Xp  but  may  also  have  a  jump  point  there.  Figure  10.8  illustrates  this 


10.4  Other  uses  of  independent  indicator  variables  /  349 


case.  Another  indicator  variable  must  now  be  introduced  to  take  care  of  the 
jump.  Suppose  time  required  to  solve  a  task  successfully  (7)  is  to  be  regressed  on 
complexity  of  task  (A) ,  when  complexity  of  task  is  measured  on  a  quantitative 
scale  from  0  to  100.  It  is  known  that  the  slope  of  the  response  line  changes  at 
Xp  =  40,  and  it  is  believed  that  the  regression  relation  may  be  discontinuous 
there.  We  therefore  set  up  the  model: 

(10.27)  7,  =  0o  +  faXn  +  02(Xn  ~  40)Xi2  +  p3Xi3  +  et 
where: 

Xn  =  Complexity  of  task 
x  =  1  if  >40 
l2  0  otherwise 

x.  =  1  if  Xu  >  40 
'3  0  otherwise 

The  response  function  for  model  (10.27)  is: 

(10.28)  E(Y)  =  (30  +  (3.X,  +  ftCX,  -  40)X2  +  (13X3 
When  Xi  <  40,  then  X2  =  0  and  X3  =  0,  so  (10.28)  becomes: 

(10.28a)  E(Y)  =  p0  +  &&  Xx  <  40 

FIGURE  10.8  Illustration  of  model  (10.27)  for  discontinuous  piecewise  linear 
regression 

Y\ 


0 


X 


350  /  Indicator  variables 


Similarly,  when  X\  >  40,  then  X2  =  1  and  X3  =  1,  so  (10.28)  becomes: 

(10.28b)  E(Y)  =  ((Bo  ~  40 (B2  +  (B3 )  +  (ft  +  >  40 

These  two  response  functions  are  shown  in  Figure  10.8,  together  with  the  param¬ 
eters  involved.  Note  that  (B3  represents  the  difference  in  the  mean  responses  for 
the  two  regression  lines  at  Xp  =  40  and  (B2  represents  the  difference  in  the  two 
slopes. 

The  estimation  of  the  regression  coefficients  for  model  (10.27)  presents  no 
new  problems.  One  may  test  whether  or  not  /33  =  0  in  the  usual  manner.  If  it  is 
concluded  that  (33  =  0,  the  regression  function  is  continuous  at  Xp  so  that  the 
earlier  piecewise  linear  regression  model  applies. 


Time  series  applications 

Economists  and  business  analysts  frequently  use  time  series  data  in  regression 
analysis.  For  instance,  savings  (Y)  may  be  regressed  on  income  (X),  where  both 
the  savings  and  income  data  pertain  to  a  number  of  years.  The  model  employed 
might  be: 

(10.29)  Yt  =  (Bo  +  (3iXt  +  st 

where  Yt  and  Xt  are  savings  and  income,  respectively,  for  time  period  t.  Suppose 
that  the  period  covered  includes  both  peacetime  and  wartime  years ,  and  that  this 
factor  should  be  recognized  since  it  is  anticipated  that  savings  in  wartime  years 
tend  to  be  higher.  The  following  model  might  then  be  appropriate: 

(10.30)  Yt  =  j30  +  faXtl  +  (B2Xt2  +  e, 
where: 


Xti  =  Income 

_  1  if  period  t  peacetime 
a  0  otherwise 

Note  that  model  (10.30)  assumes  that  the  marginal  propensity  to  save  (/Si)  is 
constant  in  both  peacetime  and  wartime  years,  and  that  only  the' height  of  the 
response  curve  is  affected  by  this  qualitative  variable. 

Another  use  of  indicator  variables  in  time  series  applications  occurs  when 
monthly  or  quarterly  data  are  used.  Suppose  that  quarterly  sales  (T)  are  regressed 
on  quarterly  advertising  expenditures  (XQ  and  quarterly  disposable  personal  in¬ 
come  (X2).  If  seasonal  effects  also  have  an  influence  on  quarterly  sales,  a  first- 
order  model  incorporating  seasonal  effects  would  be: 

(10.31)  Yt=(B0  +  faXtl  +  (B2Xt2  +  (B3Xt3  +  (B4Xt4  +  (B5Xt5  +  et 

where: 


Xn  —  Quarterly  advertising  expenditures 
Xt2  =  Quarterly  disposable  personal  income 


10.5  Some  considerations  in  using  independent  indicator  variables  /  351 


_  1  if  first  quarter 
t3  0  otherwise 

_  1  if  second  quarter 

t4  0  otherwise 
_  1  if  third  quarter 
t5  0  otherwise 

10.5  SOME  CONSIDERATIONS  IN  USING  INDEPENDENT 
INDICATOR  VARIABLES 

Indicator  variables  versus  allocated  codes 

An  alternative  to  the  use  of  indicator  variables  for  a  qualitative  independent 
variable  is  to  employ  allocated  codes.  Consider,  for  instance,  the  independent 
variable  “frequency  of  product  use”  which  has  three  classes:  frequent  user, 
occasional  user,  nonuser.  With  the  allocated  codes  approach,  a  single  inde¬ 
pendent  variable  is  employed  and  values  are  assigned  to  the  classes;  for  instance: 


Class  Xl 


Frequent  user  3 

Occasional  user  2 

Nonuser  1 


The  allocated  codes  are,  of  course,  arbitrary  and  could  be  other  sets  of  numbers. 
The  model  with  allocated  codes  for  our  example,  assuming  no  other  independent 
variables,  would  be: 

(10.32)  Y{=  fio  + filXil  + Si 

The  basic  difficulty  with  allocated  codes  is  that  they  define  a  metric  for  the 
classes  of  the  qualitative  variable  which  may  not  be  reasonable.  To  see  this 
concretely,  consider  the  mean  responses  with  model  (10.32)  for  the  three  classes 
of  the  qualitative  variable: 

Class  E(Y) 


Frequent  user  E(Y)  =  /30  +  /3i(3)  =  /30  +  3/3j 

Occasional  user  E(Y)  =  j30  +  ,  (2)  =  j30  +  2/3j 

Nonuser  E(Y)  =  /30  +  /3a  (1)  =  /30  +  fix 

Note  the  key  implication: 

E(Y\  frequent  user)  —  E(Y  (occasional  user) 

=  E(Y\  occasional  user)  —  £(F|nonuser)  =  fii 

Thus,  the  coding  1,  2,  3  implies  equal  distances  between  the  three  user  classes, 
which  may  not  be  in  accord  with  reality.  Other  allocated  codes  may,  of  course, 
imply  different  spacings  of  the  classes  of  the  qualitative  variable,  but  these  would 
ordinarily  still  be  arbitrary. 


352  /  Indicator  variables 


Indicator  variables,  in  contrast,  make  no  assumptions  about  the  spacing  of  the 
classes  and  rely  on  the  data  to  show  the  differential  effects  that  occur.  If,  for  the 
same  example,  two  indicator  variables,  say,  Xx  and  X2,  are  employed  to  repre¬ 
sent  the  qualitative  variable,  as  follows: 

Class  Xi  X2 

Frequent  user  1  0 

Occasional  user  0  1 

Nonuser  0  0 

the  model  would  be: 

(10.33)  Yi  =  p0  +  faXn  +  faXt2  +  e* 

Here  fix  measures  the  differential  effect: 

E(Y\  frequent  user)  —  £(  7 1  nonuser) 

and  jB2  measures: 

E(Y\  occasional  user)  —  £’(F|nonuser) 

Thus,  /3i  and  jB2  measure  the  differential  effects  of  user  status  relative  to  the 
class  of  nonusers  without  any  arbitrary  restrictions  to  be  satisfied  by  these  differ¬ 
ential  effects. 

Indicator  variables  versus  quantitative  variables 

If  an  independent  variable  is  quantitative,  such  as  age,  one  can  nevertheless 
use  indicator  variables  instead.  For  instance,  the  quantitative  variable  age  may  be 
transformed  by  grouping  ages  into  classes  such  as  under  21,  21-34,  35-49,  etc. 
Indicator  variables  may  then  be  used  for  the  classes  of  this  new  independent 
variable.  At  first  sight,  this  may  seem  to  be  a  questionable  approach  because 
information  about  the  actual  ages  is  thrown  away.  Furthermore,  additional  pa¬ 
rameters  are  placed  into  the  model,  which  leads  to  a  reduction  of  the  degrees  of 
freedom  associated  with  MSE. 

Nevertheless,  there  are  occasions  when  replacement  of  a  quantitative  variable 
by  indicator  variables  may  be  appropriate.  Consider,  for  example,  a  large-scale 
survey  in  which  the  relation  between  liquid  assets  (7)  and  age  (A)  of  head  of 
household  is  to  be  studied.  Two  thousand  households  will  be  included  in  the 
study,  so  that  the  loss  of  10  or  20  degrees  of  freedom  is  immaterial.  The  analyst 
is  very  much  in  doubt  about  the  shape  of  the  regression  function,  which  could  be 
highly  complex,  and  hence  may  prefer  the  indicator  variable  approach  in  order  to 
obtain  information  about  the  shape  without  making  any  assumptions  about  the 
functional  form  of  the  regression  function. 

Another  alternative,  also  utilizing  indicator  variables,  is  available  to  the  ana¬ 
lyst  in  doubt  about  the  functional  form  of  a  possibly  complex  regression  func¬ 
tion.  He  or  she  could  use  the  quantitative  variable  age,  but  employ  piecewise 
linear  regression  with  a  number  of  pieces.  Again,  this  approach  loses  degrees  of 


10.5  Some  considerations  in  using  independent  indicator  variables  /  353 


freedom  for  estimating  MSE,  but  this  is  of  no  concern  in  large-scale  studies.  The 
benefit  would  be  that  information  about  the  shape  of  the  regression  function  is 
obtained  without  making  strong  assumptions  about  its  functional  form. 


Other  codings  for  indicator  variables 


As  stated  earlier,  many  different  codings  of  indicator  variables  are  possible. 
We  mention  here  two  alternative  codings  to  our  0,  1  coding  with  c  —  1  indicator 
variables  for  a  qualitative  variable  with  c  classes. 

For  the  insurance  innovation  example,  where  Y  is  time  to  adopt  an  innovation, 
Xi  is  size  of  insurance  firm,  and  the  second  independent  variable  is  type  of 
company  (stock,  mutual),  we  could  use  the  coding: 


(10.34) 


X2 


1  if  stock  company 
1  if  mutual  company 


In  that  case,  the  first-order  linear  regression  model  would  be: 


(10.35)  Yt  —  0O  +  faXn  +  fi2Xj2  + 
which  has  response  function: 

(10.36)  E(Y)  =  0O  +  ft*!  +  (B2X2 


This  response  function  is  as  follows  for  the  two  types  of  companies: 
(10.36a)  E(Y)  =  (0O  +  02)  +  0iXj  Stock  firms 

(10.36b)  E{Y)  =  (0o  -  02)  +  01X1  Mutual  firms 


Thus,  0o  here  may  be  viewed  as  an  “average”  intercept  of  the  regression  line, 
from  which  the  stock  company  and  mutual  company  intercepts  differ  by  02  in 
opposite  directions.  A  test  whether  the  regression  lines  are  the  same  for  both 
types  of  companies  involves  H0:  02  =  0,  Ha:  02  ^  0. 

A  second  alternative  coding  scheme  is  to  use  a  0,  1  indicator  variable  for  each 
of  the  c  classes  of  the  qualitative  variable  and  drop  the  intercept  term  in  the 
regression  model.  For  our  insurance  innovation  example,  we  would  have: 

(10.37)  Yt  =  0!X„  +  0  2Xi2  +  03^/3  +  et 

where: 


Xt  i  =  Size  of  firm 

_  1  if  stock  company 

12  0  otherwise 

_  1  if  mutual  company 

13  0  otherwise 

Stock  firms 
Mutual  firms 


Here,  the  two  response  functions  would  be: 
(10.38a)  E(Y)  =  02  +  0A 

(10.38b)  E(Y)  =  03  +  0iXj 


354  /  Indicator  variables 


A  test  of  whether  or  not  the  two  regression  lines  are  the  same  would  involve  the 
alternatives  H0 :  /32  =  /33,  Ha:  (32  This  type  of  test  is  discussed  in  Section 
8.4. 

10.6  DEPENDENT  INDICATOR  VARIABLE 

In  a  variety  of  applications,  the  dependent  variable  of  interest  has  only  two 
possible  outcomes,  and  therefore  can  be  represented  by  an  indicator  variable 
taking  on  values  0  and  1. 

1.  In  an  analysis  of  whether  or  not  business  firms  have  an  industrial  relations 
department,  according  to  size  of  firm,  the  dependent  variable  was  defined  to 
have  the  two  possible  outcomes:  firm  has  industrial  relations  department,  firm 
does  not  have  industrial  relations  department.  These  outcomes  may  be  coded  1 
and  0,  respectively  (or  vice  versa). 

2.  In  a  study  of  labor  force  participation  of  wives,  as  a  function  of  age  of 
wife,  number  of  children,  and  husband’s  income,  the  dependent  variable  Y  was 
defined  to  have  the  two  possible  outcomes:  wife  in  labor  force,  wife  not  in  labor 
force.  Again,  these  outcomes  may  be  coded  1  and  0,  respectively. 

3.  In  a  study  of  liability  insurance  possession,  according  to  age  of  head  of 
household,  amount  of  liquid  assets,  and  type  of  occupation  of  head  of  household, 
the  dependent  variable  Y  was  defined  to  have  the  two  possible  outcomes:  house¬ 
hold  has  liability  insurance  policy,  household  does  not  have  liability  insurance 
policy.  These  outcomes  again  can  be  coded  1  and  0,  respectively. 

These  examples  show  the  wide  range  of  applications  in  which  the  dependent 
variable  is  dichotomous,  and  hence  may  be  represented  by  an  indicator  variable. 
A  dependent  dichotomous  variable,  taking  on  the  values  0  and  1,  is  sometimes 
said  to  involve  quantal  responses  or  binary  responses. 

Meaning  of  response  function  when 
dependent  variable  is  binary 

Consider  the  simple  linear  regression  model: 

(10.39)  Yi  =  j30  +  frXf  +  ei  Yi  =  0,  1 

where  the  responses  Yt  are  binary  0,  1  observations.  The  expected  response 
E(Yi)  has  a  special  meaning  in  this  case.  Since  E{st)  =  0  as  usual,  we  have: 

(10.40)  E(Yt)  =  fi0  +  fax. 

Consider  now  Yt  as  an  ordinary  Bernoulli  random  variable  for  which  we  can  state 
the  probability  distribution  as  follows: 

Y{  Probability 


1  P{Yi=\)  =  7Ti 

0  P(Yt  =  0)  =  1  -  TTi 


10.6  Dependent  indicator  variable  /  355 

Thus,  77;  is  the  probability  that  Yt  =  1  and  1  —  77;  is  the  probability  that  7,  =  0. 
By  the  ordinary  definition  of  expected  value  of  a  random  variable  in  (1.12),  we 
obtain: 

(10.41)  E(Yd  =  1(77;)  +  0(1  -  77;)  =  77; 

Equating  (10.40)  and  (10.41),  we  thus  find: 

(10.42)  E{Yt)  =  /30  +  /3jX;  =  77; 

The  mean  response  E(Yi)  =  /30  +  as  given  by  the  response  function  is 
therefore  simply  the  probability  that  Yt  =  1  when  the  level  of  the  independent 
variable  is  Xt.  This  interpretation  of  the  mean  response  applies  whether  the  re¬ 
sponse  function  is  a  simple  linear  one,  as  here,  or  a  complex  multiple  regression 
one.  The  mean  response,  when  the  dependent  variable  is  a  0,1  indicator  variable, 
always  represents  the  probability  that  Y  =  1  for  the  given  levels  of  the  independ¬ 
ent  variables.  Figure  10.9  illustrates  a  simple  linear  response  function  for  a 
dependent  indicator  variable.  Here,  the  indicator  variable  Y  refers  to  whether  or 


FIGURE  10.9  Illustration  of  response  function  when  dependent  variable  is  binary — 
industrial  relations  department  example 

Probability  That 
Firm  Has  Industrial 
Relations  Department 
E(Y) 

1  - 


Size  of  Firm 


356  /  Indicator  variables 


not  a  firm  has  an  industrial  relations  department,  and  the  independent  variable  X 
is  size  of  firm.  The  response  function  in  Figure  10.9  shows  the  probability  that 
firms  of  given  size  have  an  industrial  relations  department. 

Special  problems  when  dependent  variable  is  binary 

Special  problems  arise,  unfortunately,  when  the  dependent  variable  is  an  indi¬ 
cator  variable.  We  shall  consider  three  of  these  now,  using  a  simple  linear  regres¬ 
sion  model  as  an  illustration. 

1.  Nonnormal  error  terms.  For  a  binary  0,1  dependent  variable,  the  error 
terms  et  =  F,  -  (J30  +  PiXf)  can  take  on  only  two  values: 

(10.43a)  When  F*  =  1:  et  =  1  -  ft  -  ftX* 

(10.43b)  When  Y,  =  0:  e,-  =  -ft  -  ftXf 

Clearly,  the  normal  error  regression  model  (3.1),  assuming  that  the  st  are  nor¬ 
mally  distributed,  is  not  appropriate. 

2.  Nonconstant  error  variance.  Another  problem  with  the  error  terms  £,•  is 
that  they  do  not  have  equal  variances  when  the  dependent  variable  is  an  indicator 
variable.  To  see  this,  we  shall  obtain  <t2(Fz)  for  the  simple  linear  regression 
model  (10.39): 

=  E{[Y,  -  E(Y,)]2}  =  (1  -  77,)217V  +  (0  -  «i)2(l  -  IT;) 
or: 

(10.44)  atm  =  w,(  1  -  ir,)  =  [£(1))][1  -  £()',)] 

The  variance  of  e{  is,  of  course,  the  same  as  that  of  Y{  because  e,  =  F,  —  it,  and 
77;  is  a  constant: 

(10.45)  0-2(£/)  =  77;(1  -  77;)  =  [£(F;)1  [1  -  E  (  F; )  ] 
or: 

(10.45a)  cr2(e;)  =  (/So  +  MXl  -  A)  ~  ft Xd 

Note  from  (10.45a)  that  cr2(£()  depends  on  A;.  Hence,  the  error  term  variances 
will  differ  at  different  levels  of  X,  and  ordinary  least  squares  will  no  longer  be 
optimal. 

3.  Constraints  on  response  function.  Since  the  response  function  repre¬ 
sents  probabilities  when  the  dependent  variable  is  a  0,1  indicator  variable,  the 
mean  responses  should  be  constrained  as  follows: 

(10.46)  0  <  E(Y)  =  77  <  1 

Many  response  functions  do  not  automatically  possess  this  constraint.  A  linear 
response  function,  for  instance,  may  fall  outside  the  constraint  limits  within  the 
range  of  the  independent  variable  in  the  scope  of  the  model. 


10.7  Linear  regression  with  dependent  indicator  variable  /  357 


These  three  problems  create  difficulties,  but  solutions  can  be  found.  Problem 
2  concerning  unequal  error  variances,  for  instance,  can  be  handled  by  using 
weighted  least  squares.  Problem  3  about  the  constraints  on  the  response  function 
can  be  handled  by  making  sure  that  the  mean  responses  for  the  fitted  model  do 
not  fall  below  0  or  above  1  for  levels  of  X  within  the  scope  of  the  model,  or  else 
by  using  a  model  which  automatically  meets  the  constraints. 

Finally,  even  though  the  error  terms  are  not  normal  when  the  dependent  varia¬ 
ble  is  binary,  the  method  of  least  squares  still  provides  unbiased  estimators  that 
are  asymptotically  normal  under  quite  general  conditions.  Hence,  when  the  sam¬ 
ple  size  is  large,  inferences  concerning  the  regression  coefficients  and  mean 
responses  can  be  made  in  the  same  fashion  as  when  the  error  terms  are  assumed 
to  be  normally  distributed. 


10.7  LINEAR  REGRESSION  WITH  DEPENDENT 
INDICATOR  VARIABLE 

We  consider  now  the  fitting  of  linear  response  functions  when  the  dependent 
variable  is  binary.  Then  we  shall  explain  the  fitting  of  curvilinear  response  func¬ 
tions. 


niustration 

A  systems  analyst  studied  the  effect  of  computer  programming  experience  on 
ability  to  complete  a  complex  programming  task,  including  debugging,  within  a 
specified  time.  Twenty-five  persons  were  selected  for  the  study.  They  had  vary¬ 
ing  amounts  of  programming  experience  (measured  in  months  of  experience),  as 
shown  in  Table  10.7,  column  1.  All  persons  were  given  the  same  programming 
task,  and  the  results  of  their  success  in  the  task  are  shown  in  Table  10.7,  column 
2.  The  results  are  coded  in  binary  fashion:  if  the  task  was  completed  successfully 
in  the  allotted  time,  Y  =  1,  and  if  the  task  was  not  completed  successfully, 
7=0.  Figure  10.10  contains  a  scatter  plot  of  the  data.  This  plot  is  not  too 
informative  because  of  the  nature  of  the  dependent  variable,  other  than  to  indi¬ 
cate  that  ability  to  complete  the  task  successfully  appears  to  increase  with  amount 
of  experience.  At  this  point,  it  was  decided  to  fit  the  linear  regression  model 
(10.39). 


Ordinary  least  squares  fit 

One  approach  to  fitting  model  (10.39)  is  to  fit  it  by  ordinary  least  squares 
despite  the  unequal  error  variances.  The  estimated  regression  coefficients  will 
still  be  unbiased,  but  they  will  no  longer  have  the  minimum  variance  property 
among  the  class  of  unbiased  linear  estimators.  Thus,  the  use  of  ordinary  least 
squares  may  lead  to  inefficient  estimates,  i.e.,  estimates  with  larger  variances 
than  could  be  obtained  with  weighted  procedures. 


358  /  Indicator  variables 


TABLE  10.7  Data  for  programming  task  example 


(1) 

(2) 

(3) 

(4) 

Months  of 

Task 

Person 

Experience 

Success 

i 

xt 

Yi 

Yi 

wt 

1 

14 

0 

.34920 

4.4003 

2 

29 

0 

.82212 

6.8382 

3 

6 

0 

.09697 

11.4196 

4 

25 

1 

.69601 

4.7263 

5 

18 

1 

.47531 

4.0098 

6 

4 

0 

.03392 

30.5196 

7 

18 

0 

.47531 

4.0098 

8 

12 

0 

.28614 

4.8956 

9 

22 

1 

.60142 

4.1717 

10 

6 

0 

.09697 

11.4196 

11 

30 

1 

.85365 

8.0044 

12 

11 

0 

.25461 

5.2691 

13 

30 

1 

.85365 

8.0044 

14 

5 

0 

.06544 

16.3502 

15 

20 

1 

.53837 

4.0237 

16 

13 

0 

.31767 

4.6135 

17 

9 

0 

.19156 

6.4573 

18 

32 

1 

.91671 

13.0967 

19 

24 

0 

.66448 

4.4854 

20 

13 

1 

.31767 

4.6135 

21 

19 

0 

.50684 

4.0007 

22 

4 

0 

.03392 

30.5196 

23 

28 

1 

.79059 

6.0403 

24 

22 

1 

.60142 

4.1717 

25 

8 

1 

.16003 

7.4394 

The  ordinary  least  squares  fit  to  the  data  in  Table  10.7  leads  to  the  following 
results: 

Estimated 

Estimated  Standard 

Coefficient  Coefficient  Deviation 

Po  b0  =  —.092197  s(bo)  =  .  183272 

bx  =  .031528  s(fcj)  =  .009606 

The  estimated  response  function  therefore  is: 

(10.47)  Y  =  -.092197  +  .031528Z 

This  response  function  is  shown  in  Figure  10. 10.  It  may  be  used  in  the  ordinary 
manner.  For  instance,  the  estimated  mean  response  for  persons  with  Xh  =  14 
months  experience  is: 

Yh  =  -.092197  +  .031528(14)  =  .34920 

Thus,  we  estimate  that  the  probability  is  .349  that  a  person  with  14  months 
experience  will  successfully  complete  the  programming  task. 


10.7  Linear  regression  with  dependent  indicator  variable  /  359 

FIGURE  10.10  Scatter  plot  and  fitted  regression  line — programming  task  example 

Success  in 
Task 


As  noted  earlier  from  the  scatter  plot,  this  probability  increases  with  increas¬ 
ing  experience.  The  coefficient  bx  indicates  that  the  estimated  probability  in¬ 
creases  by  .0315  for  each  additional  month  of  experience.  Clearly,  the  model 
cannot  be  extrapolated  much  beyond  the  range  of  the  data  here  or  else  the  esti¬ 
mated  probabilities  would  be  negative  or  exceed  1,  meaningless  results.  For 
Xh  =  40  months  experience,  for  instance,  Yh  =  1.17. 


Weighted  least  squares  fit 

It  was  pointed  out  in  Chapter  5  that  weighted  least  squares  provides  efficient 
estimates  when  the  error  variances  are  unequal.  Since  we  know  from  (10.45) 
that: 


O^e,)  =  77, (  1  -  77,)  =  [£(F,)][1  -  £(y,)] 


360  /  Indicator  variables 


it  would  appear  that  using  weighted  least  squares  with  weights: 


(10.48) 


1  _ _ 1 _ 

77,(1-77,)  “  [£(F,)][1  -£(F,)] 


is  the  proper  approach  to  take.  There  exists  a  difficulty,  however,  in  carrying  this 
out,  namely,  that  the  w,  in  (10.48)  involve  £(F,)  and  hence  the  parameters  /30  and 
jSi,  which  are  unknown  and  must  be  estimated. 

A  way  out  of  this  difficulty  is  to  use  a  two-stage  least  squares  procedure: 

1.  Stage  1 — fit  the  regression  model  by  ordinary  least  squares. 

2.  Stage  2 — estimate  the  weights  w,  from  the  results  of  stage  1: 


(10.49) 


1 

U 1  -  Yd 


and  then  use  these  estimated  weights  in  the  matrix  W  in  (7.70)  for  obtaining 
weighted  least  squares  estimates  for  the  regression  model  by  means  of 
(7.71). 


To  apply  this  approach  in  our  example,  we  first  need  to  find  the  estimated 
weights  Wj  from  the  ordinary  least  squares  fit.  This  ordinary  least  squares  fit  has 

/■V 

been  obtained  earlier  in  (10.47).  Hence,  we  are  ready  to  calculate  F,  and  then 
1/F,(1  —  F,)  for  each  observation.  For  the  first  observation,  for  instance,  Xx  = 
14  so  that  F i  =  .34920,  as  found  earlier.  Hence,  the  estimated  weight  for  the 
first  observation  is: 


1 

,34920(.  65080) 


4.4003 


In  the  same  manner,  the  other  weights  are  calculated.  In  Table  10.7,  columns  3 
and  4  contain  the  fitted  values  and  the  estimated  weights,  respectively. 

To  obtain  the  weighted  least  squares  estimates  for  our  example,  we  utilized  a 
computer  run  with  the  weights  as  given  in  Table  10.7.  This  led  to  the  following 
results: 


Estimated  Estimated 

Regression  Regression  Standard 

Coefficient  Coefficient  Deviation 


I30  b0  =  —.117113  s(bQ)  =  .111841 

,3,  b1  =  .032672  s(b 0  =  .006644 

The  fitted  response  function  by  the  two-stage  least  squares  approach  therefore  is: 

(10.50)  F  =  -.117113  +  .032672X 

Note  that  this  estimated  response  function  does  not  differ  markedly  from  the 
one  obtained  by  unweighted  least  squares  in  (10.47),  although  the  estimated 
standard  deviations  of  the  regression  coefficients  are  now  somewhat  smaller. 


10.8  Logistic  response  function  /  361 


Comments 


1.  If  the  mean  responses  range  between  about  .2  and  .8  for  the  scope  of  the  model, 
there  is  little  to  be  gained  from  weighted  least  squares  since  the  error  variances  will  differ 
but  little.  Only  if  the  mean  responses  range  below  .2  and/or  above  .8  will  the  error 
variances  be  sufficiently  unequal  to  make  weighted  least  squares  worthwhile. 

2.  In  our  example,  the  fitted  response  function  does  not  fall  below  0  or  above  1  within 
the  range  of  the  data  (4  months  to  32  months  experience).  If  it  did,  a  curvilinear  response 
function  would  have  to  be  employed.  We  discuss  one  important  curvilinear  model  below. 

3.  The  weighted  least  squares  approach  could  employ  additional  stages,  with  the 
weights  refined  at  each  stage.  Usually,  however,  the  gain  from  additional  iterations  is  not 
large.  For  our  example,  for  instance,  another  iteration  led  to  the  following  results: 

b0  =  -.125665  s(b0)  =  .084555 

bx  =  .033075  s(bx)  =  .005885 

Note  that  there  is  relatively  little  decrease  from  the  previous  stage  in  the  estimated  stand¬ 
ard  deviations  and  only  small  changes  in  the  regression  coefficients. 

4.  If  there  are  repeat  observations  at  the  different  X  levels,  the  procedure  of  fitting  a 
linear  response  function  can  be  simplified.  Let  Xi, ...  ,XC  denote  the  X  levels,  pj  the 
proportion  of  1  ’  s  at  Xj  (j  —  1 , ...  ,c),  and  ny  the  number  of  observations  at  Xj.  Fitting  the 
sample  proportions  pj  with  weights: 

(10.51)  Wj  —  rij 

leads  to  the  identical  estimated  response  function  as  ordinary  least  squares  applied  to  the 
individual  Y  observations.  If  there  are  hundreds  of  Y  observations  located  at  a  limited 
number  of  Xj  levels,  much  computational  effort  can  be  saved  by  fitting  the  sample  propor¬ 
tions  pj. 

If  weighted  least  squares  estimates  are  to  be  obtained,  only  one  stage  is  required  when 
sample  proportions  pj  are  available.  The  weights  would  be  as  follows: 


(10.52) 


w/  = 


nj 

PA1  ~  Pj) 


where  wy  denotes  the  estimated  weight. 

We  discuss  fitting  a  regression  model  to  grouped  data  more  fully  below. 

5.  While  we  have  illustrated  only  simple  linear  regression,  the  extension  to  multiple 
regression  is  straightforward. 


10.8  LOGISTIC  RESPONSE  FUNCTION 

Both  theoretical  and  empirical  considerations  suggest  that  when  the  dependent 
variable  is  an  indicator  variable,  the  shape  of  the  response  function  will  fre¬ 
quently  be  curvilinear.  Figure  10.11  contains  a  curvilinear  response  function 
which  has  been  found  appropriate  in  many  instances  involving  a  binary  depend¬ 
ent  variable.  Note  that  this  response  function  is  shaped  like  a  tilted  S,  and  that  it 
has  asymptotes  at  0  and  1.  The  latter  feature  assures  that  the  constraints  on  E(Y) 
in  (10.46)  are  automatically  met.  The  response  function  plotted  in  Figure  10.11 
is  called  the  logistic  function  and  is  given  by: 


362  /  Indicator  variables 


FIGURE  10.11  Example  of  logistic  response  function 


(10.53) 


E{Y)  = 


exp(/3p  +  faX) 

1  +  exp(/30  +  j8iX) 


An  interesting  property  of  the  logistic  function  is  that  it  can  easily  be  linearized. 
Let  us  denote  E(Y)  by  it,  since  the  mean  response  is  a  probability  when  the 
dependent  variable  is  a  0,1  indicator  variable.  Then  if  we  make  the  transforma¬ 
tion: 


(10-54)  7r'=log^Y— 

we  obtain  from  (10.53): 

(10.55)  TTf  =  (5QY^X 

The  transformation  (10.54)  is  called  the  logistic  or  logit  transformation  of  the 
probability  it. 


Fitting  of  logistic  function 

The  fitting  of  the  transformed  logistic  response  function  (10.55)  is  relatively 
simple  when  there  are  repeat  observations  at  each  X  level.  We  now  explain  the 


10.8  Logistic  response  function  /  363 


fitting  procedure  for  this  case,  which  arises  frequently  in  practice.  We  cite  two 
examples: 

1 .  A  pricing  experiment  involves  showing  a  new  product  to  a  consumer,  pro¬ 
viding  information  about  it,  and  then  asking  the  consumer  whether  he  or  she 
would  buy  the  product  at  a  given  price.  Five  prices  are  studied,  and  n 
persons  are  exposed  to  a  given  price.  The  dependent  variable  is  binary 
(would  purchase,  would  not  purchase);  the  independent  variable  is  price  and 
has  five  classes. 

2.  Four  hundred  heads  of  households  are  asked  to  indicate  on  a  10-point  scale 
their  intent  to  buy  a  new  car  within  the  next  12  months.  A  year  later,  each 
household  is  interviewed  to  determine  whether  a  new  car  was  purchased. 
The  dependent  variable  is  binary  (did  purchase,  did  not  purchase).  The  inde¬ 
pendent  variable  is  the  measure  of  intent  to  buy  and  has  10  classes. 

We  shall  denote  the  X  levels  at  which  observations  are  obtained  by  A, , , 
Xc.  The  number  of  observations  at  level  Xj  will  be  denoted  by  tij  (j  =  1 , . .  . ,  c) . 
It  can  be  shown  that  we  need  consider  only  the  total  number  of  l’s  at  each  X 
level,  and  not  the  individual  Y  values.  Let  Rj  be  the  number  of  l’s  at  the  leveJAy. 
Hence,  the  proportion  of  l’s  at  the  level  Xj,  denoted  by  pj,  is: 

(10.56)  Pj  =  %- 

nj 

Table  10.8  on  page  365  contains  the  Xj,  rij,  Rj,  and  pj  values  for  an  example  we 
shall  discuss  shortly.  Xj  refers  to  the  price  reduction  offered  by  a  coupon,  rij  is 
the  number  of  households  which  received  a  coupon  with  price  reduction  Xj,  Rj  is 
the  number  of  these  households  that  redeemed  the  coupon,  and  pj  is  the  propor¬ 
tion  of  households  receiving  a  coupon  with  price  reduction  Xj  that  did  redeem  the 
coupon. 

We  fit  the  transformed  logistic  response  function  (10.55): 

tt'=(30  +  fcX 

by  making  the  logistic  transformation  (10.54)  on  the  sample  proportions: 

(10.57)  pj  =  loge 

and  using  p  ■  as  the  dependent  variable.  If  transformation  (10.57)  is  to  be  carried 
out  manually,  pocket  calculators  with  Joge  function  keys  or  tables  of  natural 
logarithms  can  be  used. 

The  logistic  transformation,  while  linearizing  the  response  function,  does  not 
eliminate  the  unequal  variances  of  the  error  terms.  Hence,  weighted  least  squares 
should  be  used.  It  can  be  shown  that  when  n7-  is  reasonably  large,  the  approxi¬ 
mate  variance  of  p j  is: 


1 


UjTTjil  ~  TTj) 


(10.58) 


°-2(Pj)  = 


364  /  Indicator  variables 


which  is  estimated  by: 


(10.59) 


1 


njPjil  ~  Pj ) 


Hence,  the  estimated  weights  to  be  used  in  the  weighted  least  squares  computa¬ 
tions  are: 


(10.60)  wj  =  njPjil  -  pj) 

Note  that  the  use  of  the  estimated  weights  (10.60)  requires  the  sample  sizes  rij  to 
be  reasonably  large. 

The  fitting  of  a  linear  regression  model  with  independent  variable  X  and 
dependent  variable  pj,  using  weighted  least  squares,  is  straightforward.  Once  the 
fitted  response  function  has  been  obtained: 

(10.61)  tt'  =  bo  +  b\X 


it  can  be  transformed  back  into  the  original  units,  if  desired: 


(10.62) 


exp(Z?0  +  bxX ) 

1  +  exp(Z?0  +  bxX) 


Example 

In  a  study  of  the  effectiveness  of  coupons  offering  a  price  reduction  on  a  given 
product,  1,000  homes  were  selected  and  a  coupon  and  advertising  material  for 
the  product  were  mailed  to  each.  The  coupons  offered  different  price  reductions 
(5,  10,  15,  20,  and  30  cents),  and  200  homes  were  assigned  at  random  to  each  of 
the  price  reduction  categories.  The  independent  variable  X  in  this  study  is  the 
amount  of  price  reduction,  and  the  dependent  variable  Y  is  a  binary  variable 
indicating  whether  or  not  the  coupon  was  redeemed  within  a  six-month  period. 

It  was  expected  that  the  logistic  response  function  would  be  an  appropriate 
description  of  the  relation  between  price  reduction  and  probability  that  the  cou¬ 
pon  is  utilized.  Since  there  were  repeat  observations  at  each  Xj,  and  since  the 
number  of  repeat  observations  at  each  Xj  was  large  (rij  =  200,  for  all  /),  the 
procedure  described  earlier  could  be  used  for  fitting  the  logistic  response  func¬ 
tion. 

Table  10.8  contains  the  basic  data  for  this  experimental  study  in  columns  1 
through  4.  The  transformed  proportions  pj  are  shown  in  column  5.  For  instance, 
for  Xx  =  5,  we  have: 


,  /  -160  \ 
/,;=i°Hi^j=-1'65823 

Finally,  the  approximate  weights  Wj  are  shown  in  column  6  of  Table  10.8.  For 
instance,  for  Xx  =  5,  we  have: 


m>!  =  nxpi(l  -  px)  =  200(.160)(.840)  =  26.880 


10.8  Logistic  response  function  /  365 


TABLE  10.8  Data  for  coupon  effectiveness  example 


(1) 

Price 

Reduction 

Xj 

(2) 

Number  of 
Households 

”j 

(3) 

Number  of 
Coupons 
Redeemed 

Rj 

(4) 

Proportion  of 
Coupons 
Redeemed 

Pj 

(5) 

Transformed 

Proportion 

Pj 

(6) 

Weight 

Wj 

5 

200 

32 

.160 

-  1.6582 

26.880 

10 

200 

51 

.255 

-  1.0721 

37.995 

15 

200 

70 

.350 

-  .6190 

45.500 

20 

200 

103 

.515 

.0600 

49.955 

30 

200 

148 

.740 

1.0460 

38.480 

Prior  to  the  fitting  of  the  logistic  model,  the  transformed  proportions  pj  were 
plotted  against  Xj.  This  plot  is  shown  in  Figure  10.12.  It  appears  from  there  that 
a  linear  response  function  would  fit  the  transformed  proportions  well.  Hence,  it 
was  decided  to  proceed  with  fitting  the  transformed  logistic  model  (10.55). 

The  fitting  of  the  transformed  logistic  model  by  weighted  least  squares  is 
straightforward.  A  computer  run  using  weighted  least  squares,  with  the  weights 
those  in  Table  10.8,  column  6,  led  to  the  following  fitted  response  function: 

(10.63)  tt'  =  -2.18506  +  .108700X 

This  response  function  is  plotted  in  Figure  10.12,  and  it  appears  to  fit  the  data 
well. 

The  fitted  response  function  (10.63)  is  used  in  the  ordinary  manner.  To  esti¬ 
mate  the  probability  of  a  coupon  redemption  if  the  price  reduction  is,  say,  25 
cents,  we  first  substitute  in  (10.63): 

tt'  =  -2.18506  +  .108700(25)  =  .53244 

and  then  transform  to  the  original  variable: 

antilog  e(.  5 3 244)  =  1.70308  =  — — 

1  —  77 


so  that: 


77  =  .630 

when  X  =  25  cents.  Hence,  we  estimate  that  about  63  percent  of  25 -cent  cou¬ 
pons  will  be  redeemed  within  six  months. 

The  interpretation  of  the  slope  bx  —  .  1087  in  the  fitted  logistic  response  func¬ 
tion  (10.63)  is  not  simple,  unlike  the  straightforward  interpretation  of  the  slope  in 
a  linear  regression  model.  The  reason  is  that  the  effect  of  increasing  X  by  a  unit 
varies  for  the  logistic  model  according  to  the  location  of  the  starting  point  on  the 
X  scale.  One  interpretation  of  bl  is  found  in  the  property  of  the  logistic  function 
that  the  “odds”  77/ (1  —  77)  are  multiplied  by  txp(b  \  )  for  any  unit  increase  in  X. 


366  /  Indicator  variables 


FIGURE  10.12  Plot  of  transformed  data  and  fitted  logistic  response  function — coupon 
effectiveness  example 


Transformed 

Proportion 


Price  Reduction 


Comments 

1 .  The  procedure  illustrated  assumes  that  no  pj  —  0  or  1 .  If  one  of  these  cases  should 
occur,  or  if  some  pj  are  very  close  to  0  or  1,  modifications  in  the  transformation  (10.57) 
and  in  the  weights  (10.60)  should  be  made.  One  approach  widely  used  is  to  modify  the 
extreme  pj  as  follows: 

(10.64a)  pj  —  ^ —  if  Rj  =  0 

lnj 

(10.64b)  pj  —  1  — ~ —  if  Rj  =  rij 

liij 

Reference  10.1  discusses  several  appropriate  modifications  for  the  case  when  pj  equals  0 
or  1. 

2.  A  curvilinear  response  function  of  almost  the  same  shape  as  the  logistic  function 
(10.53)  is  obtained  by  transforming  the  pj  by  means  of  the  cumulative  normal  distribution. 
This  transformation  is  called  a  probit  transformation.  Reference  10.2  describes  this  trans- 


Problems  /  367 


formation.  The  probit  transformation  leads  to  more  complex  calculations  than  the  logistic 
transformation,  but  it  does  have  the  desirable  property  that  inferences  can  be  made  more 
readily  with  it  than  with  the  logistic  transformation. 

3.  The  transformed  logistic  model  (10.55)  can  easily  be  extended  into  a  multiple 
regression  model.  For,  say,  two  independent  variables,  the  transformed  logistic  response 
function  would  be: 


(10.65)  tt'  =  A,  +  A*!  +  p2X2 

where  tt'  is  the  logistic  transformation  (10.54)  of  tt. 

4.  The  logistic  model  sometimes  is  motivated  by  threshold  theory.  Consider  the 
breaking  strength  of  concrete  blocks,  measured  in  pounds  per  square  inch.  Each  block  is 
assumed  to  have  a  threshold  T,,  such  that  it  will  break  if  pressure  equal  to  or  greater  than 
Tt  is  applied  and  it  will  not  break  if  smaller  pressure  is  applied.  A  block  can  be  tested  for 
only  one  pressure  level,  since  some  weakening  of  the  block  occurs  with  the  first  test. 
Thus,  it  is  not  possible  to  find  the  actual  threshold  of  each  block,  but  only  whether  or  not 
the  threshold  level  is  above  or  below  a  particular  pressure  applied  to  the  block.  Of  course, 
different  pressures  can  be  applied  to  different  blocks  in  order  to  obtain  information  about 
the  breaking  strength  thresholds  of  the  blocks  in  the  population. 

Let  Yi  =  1  if  the  block  breaks  when  pressure  X2  is  applied,  and  F;  —  0  if  the  block  does 
not  break.  The  earlier  statement  of  threshold  theory  then  implies: 

Yi  -  1  whenever  T,  <  X, 

(10.66) 

Yi  -  0  whenever  7)  >  X, 


It  follows  that  for  given  pressure  Xt  applied  to  a  block  selected  at  random: 

(10.67)  ^  =  P(Yi  =  1|X;)  =  P{Ti  <  Xd 

Now  P(T  <  X)  is  the  cumulative  probability  distribution  of  the  thresholds  of  all  blocks  in 
the  population.  If  this  cumulative  probability  distribution  of  thresholds  is  logistic: 

P(T  <X)  = - — 

1  +  exp(/30  +  frX) 

we  arrive  at  the  logistic  response  model  (10.53). 

Incidentally,  if  the  probability  distribution  of  thresholds  were  normal,  the  probit  re¬ 
sponse  model  in  comment  2  above  would  be  the  relevant  one  according  to  threshold 
theory. 


PROBLEMS 

10.1.  A  student  who  used  a  regression  model  that  included  indicator  variables  was 
upset  when  receiving  only  the  following  output  on  the  multiple  regression  print¬ 
out:  XTRANSPOSE  X  SINGULAR.  What  is  a  likely  source  of  the  difficulty? 

10.2.  Refer  to  regression  model  (10.4).  Portray  graphically  the  response  curves  for  this 

model  if  j80  =  25.3,  =  .20,  and  ji2  =  —12.1. 

10.3.  In  a  regression  study  of  factors  affecting  learning  time  for  a  certain  task  (meas¬ 
ured  in  minutes),  sex  of  learner  was  included  as  an  independent  variable  (X2)  that 
was  coded  X2  =  1  if  male  and  0  if  female.  It  was  found  that  b2  =  22.3  and 


368  /  Indicator  variables 


s{b2)  =  3.8.  An  observer  questioned  whether  the  coding  scheme  for  sex  is  fair 
because  it  results  in  a  positive  coefficient,  leading  to  longer  learning  times  for 
males  than  females.  Comment. 

10.4.  Refer  to  Calculator  maintenance  Problem  2. 16.  The  users  of  the  desk  calcula¬ 
tors  are  either  training  institutions  that  use  a  student  model,  or  business  firms  that 
employ  a  commercial  model.  An  analyst  at  Tri-City  wishes  to  fit  a  regression 
model  including  both  number  of  calculators  serviced  (X))  and  type  of  calculator 
(X2)  as  independent  variables  and  estimate  the  effect  of  calculator  model  (S — 
student,  C — commercial)  on  number  of  minutes  spent  on  the  service  call.  Rec¬ 
ords  show  that  the  models  serviced  in  the  18  calls  were: 

i:  123456789 

Model:  CSCCCSSCC 

i;  10  11  12  13  14  15  16  17  18 

Model:  CCSSCSCCC 

Assume  that  regression  model  (10.4)  is  appropriate,  and  let  X2  =  1  if  student 
model  and  0  if  commercial  model. 

a.  Explain  the  meaning  of  all  regression  coefficients  in  the  model. 

b.  Fit  the  regression  model  and  state  the  estimated  regression  function. 

c.  Estimate  the  effect  of  calculator  model  on  mean  service  time  with  a  95 
percent  confidence  interval.  Interpret  your  interval  estimate. 

d.  Why  would  the  analyst  wish  to  include  the  number  of  calculators  variable  in 
the  model  when  interest  is  in  estimating  the  effect  of  type  of  calculator 
model  on  service  time? 

e.  Obtain  the  residuals  and  plot  them  against  XxX2.  Is  there  any  indication  that 
an  interaction  term  in  the  model  would  be  helpful? 

10.5.  Refer  to  Grade  point  average  Problem  2.15.  An  assistant  to  the  director  of 
admissions  conjectures  that  the  predictive  power  of  the  model  could  be  improved 
by  adding  information  on  whether  the  student  had  chosen  a  major  field  of  con¬ 
centration  at  the  time  the  application  was  submitted.  Assume  that  regression 
model  (10.4)  is  appropriate,  where  Xx  is  entrance  test  score  and  X2=  1  if  student 
had  indicated  a  major  field  of  concentration  at  the  time  of  application  and  0  if  the 
major  field  was  undecided.  Students  3,  4,  5,  6,  9,  and  17  had  indicated  a  major 
field  at  the  time  of  application. 

a.  Explain  how  each  regression  coefficient  in  model  (10.4)  is  interpreted  here. 

b.  Fit  the  regression  model  and  state  the  estimated  regression  function. 

c.  Test  whether  the  X2  variable  can  be  dropped  from  the  regression  model;  use 
a  =  .01.  State  the  alternatives,  decision  rule,  and  conclusion. 

d.  Obtain  the  residuals  for  model  (10.4)  and  plot  them  against  XxX2.  Is  there 
any  evidence  in  your  plot  that  it  would  be  helpful  to  include  an  interaction 
term  in  the  model? 

10.6.  Refer  to  regression  models  (10.4)  and  (10.6).  Would  the  conclusion  that  /32  —  0 
have  the  same  implication  for  each  of  these  models?  Explain. 

10.7.  Refer  to  regression  model  (10.6).  Portray  graphically  the  response  curves  for  this 
model  if  (30  =  25,  (3X  =  .30,  (32  —  —12.5,  and  /S3  =  .05.  Describe  the  nature  of 
the  interaction  effect. 


Problems  /  369 


10.8.  Refer  to  Calculator  maintenance  Problems  2. 16  and  10.4.  Fit  regression  model 
(10.6)  and  test  whether  the  interaction  term  can  be  dropped  from  the  model; 
control  the  a  risk  at  .10.  State  the  alternatives,  decision  rule,  and  conclusion.  If 
the  interaction  term  cannot  be  dropped  from  the  model,  describe  the  nature  of  the 
interaction  effect. 

10.9.  Refer  to  Grade  point  average  Problems  2.15  and  10.5. 

a.  Fit  regression  model  (10.6)  and  state  the  estimated  regression  function. 

b.  Test  whether  the  interaction  term  can  be  dropped  from  the  model;  use  a  = 
.05.  State  the  alternatives,  decision  rule,  and  conclusion.  If  the  interaction 
term  cannot  be  dropped  from  the  model,  describe  the  nature  of  the  interac¬ 
tion  effect. 

10.10.  In  a  regression  analysis  of  on-the-job  head  injuries  to  warehouse  laborers  caused 
by  falling  objects,  Y  is  a  measure  of  severity  of  the  injury,  X\  is  an  index  reflect¬ 
ing  both  the  weight  of  the  object  and  the  distance  it  fell,  and  X2  and  X3  are 
indicator  variables  for  nature  of  head  protection  worn  at  the  time  of  the  accident, 
coded  as  follows: 

Type  of  Protection  X2  X3 

Hard  hat  1  0 

Bump  cap  0  1 

None  0  0 

The  response  function  to  be  used  in  the  study  is  E(Y)  =  /30  +  /3|X|  +  fi2X2  + 
PsX3. 

a.  Develop  the  response  function  for  each  type-of-protection  category. 

b.  For  each  of  the  following  questions,  specify  the  alternatives  Hq  and  Ha  for 
the  appropriate  test:  (1)  With  Xi  fixed,  does  wearing  a  bump  cap  reduce  the 
expected  severity  of  injury  as  compared  with  wearing  no  protection?  (2) 
With  Xx  fixed,  is  the  expected  severity  of  injury  the  same  when  wearing  a 
hard  hat  as  when  wearing  a  bump  cap? 

Refer  to  the  tool  wear  regression  model  (10.12).  Suppose  the  indicator  variables 
had  been  defined  as  follows:  X2  —  1  if  tool  model  M2  and  0  otherwise,  Z3  =  1  if 
tool  model  M3  and  0  otherwise,  X4  =  1  if  tool  model  M4  and  0  otherwise. 
Indicate  the  meaning  of  each  of  the  following:  (1)  f}3,  (2)  —  (3$,  (3)  ,  (4) 

Pi  ■ 

Refer  to  the  advertising  expenditures  regression  model  (10.16). 

a.  How  is  jS4  interpreted  here?  What  is  the  meaning  of  /S3  —  /32  here? 

b.  State  the  alternatives  for  a  test  of  whether  the  response  functions  are  the 
same  in  incorporated  and  not-incorporated  firms  with  high-quality  sales 
management. 

A  marketing  research  trainee  in  the  national  office  of  a  chain  of  shoe  stores  used 
the  following  response  function  to  study  seasonal  (winter,  spring,  summer,  fall) 
effects  on  sales  of  a  certain  line  of  shoes:  E(Y)  —  /30  +  PiX\  +  (32X2  +  P3X3 . 
The  X’s  are  indicator  variables  defined  as  follows:  Xi  —  1  if  winter  and  0  other¬ 
wise,  X2  =  1  if  spring  and  0  otherwise,  X3  —  1  if  fall  and  0  otherwise.  After 
fitting  the  model,  she  tested  the  regression  coefficients  p,  (J  =  0, ...  ,3)  and 
came  to  the  following  set  of  conclusions  at  an  .05  family  level  of  significance: 
fio  X  0,  fix  =  0,  jS2  X  0,  p3X  0.  In  her  report  she  then  wrote:  “Results  of 


10.11. 


10.12. 


10.13. 


370  /  Indicator  variables 


regression  analysis  show  that  climatic  and  other  seasonal  factors  have  no  influ¬ 
ence  in  determining  sales  of  this  shoe  line  in  the  winter.  Seasonal  influences  do 
exist  in  the  other  seasons.”  Do  you  agree  with  this  interpretation  of  the  test 
results? 

10.14.  Assessed  valuations.  A  tax  consultant  studied  the  current  relation  between 
selling  price  and  assessed  valuation  of  one-family  residential  dwellings  in  a  large 
tax  district.  He  obtained  data  for  a  random  sample  of  nine  recent  “arm’s-length” 
sales  transactions  of  one-family  dwellings  located  on  comer  lots  and  also  ob¬ 
tained  data  for  a  random  sample  of  14  recent  sales  of  one-family  dwellings  not 
located  on  comer  lots.  In  the  data  that  follow,  both  selling  price  (F)  and  assessed 
valuation  (ZD  are  expressed  in  thousand  dollars.  Assume  that  the  error  term 
variances  in  the  two  populations  are  equal  and  that  regression  model  (10.6)  is 
appropriate. 


Corner  Lots 

i:  123456789 

Xn:  17.5  12.5  20.0  16.0  15.0  14.7  17.5  12.3  11.5 
Yf.  56.2  42.5  68.6  54.8  50.0  47.5  56.9  34.0  39.0 

Noncorner  Lots 

i\  \  2  3  4  5  6  7  8  9  10  11  12  13  14 

Xn:  10.0  13.8  15.0  19.5  17.0  12.5  14.5  12.8  12.0  16.0  10.0  17.0  10.8  15.0 

y,:  31.2  36.9  41.0  51.8  48.0  33.3  38.0  35.9  32.0  44.3  29.0  46.1  30.0  42.0 

a.  Plot  the  sample  data  for  the  two  populations  on  one  graph,  using  different 
symbols  for  the  two  samples.  Does  the  regression  relation  appear  to  be  the 
same  for  the  two  populations? 

b.  Test  for  identity  of  the  regression  functions  for  dwellings  on  corner  lots  and 
dwellings  in  other  locations;  control  the  risk  of  Type  I  error  at .  10.  State  the 
alternatives,  decision  rule,  and  conclusion. 

c.  Plot  the  estimated  regression  functions  for  the  two  populations  and  describe 
the  nature  of  the  differences  between  them. 

d.  Estimate  the  difference  in  the  slopes  of  the  two  regression  functions  using  a 
90  percent  confidence  interval. 

e.  Prepare  a  residual  plot  for  each  sample.  Does  the  assumption  of  equal  error 
term  variances  appear  to  be  reasonable  here? 

10.15.  Tire  testing.  A  testing  laboratory  with  equipment  that  simulates  highway  driv¬ 
ing  studied  for  two  makes  (A,  B)  of  a  certain  type  of  truck  tire  the  relation 
between  operating  cost  per  mile  (F)  and  cruising  speed  (ZD-  The  observations 
are  shown  below  (all  data  are  coded).  An  engineer  now  wishes  to  decide  whether 
or  not  the  regression  of  operating  cost  on  cruising  speed  is  the  same  for  the  two 
makes.  Assume  that  the  error  term  variances  for  the  two  makes  of  tires  are  the 
same  and  that  model  (10.6)  is  appropriate. 

Make  A 

i:  1  2  3  4  5  6  7  8  9  10 


Xn:  10  20  20  30  40  40  50  60  60  70 

F,:  9.8  12.5  14.2  14.9  19.0  16.5  20.9  22.4  24.1  25.8 


Problems  /  371 


Make  B 

i:  1  2  3  4  5  6  7  8  9  10 

X(1:  10  20  20  30  40  40  50  60  60  70 

Yt:  15.0  14.5  16.1  16.5  16.4  19.1  20.9  22.3  19.8  21.4 

a.  Plot  the  sample  data  for  the  two  populations  on  one  graph,  using  different 
symbols  for  the  two  samples.  Does  the  relation  between  speed  and  cost 
appear  to  be  the  same  for  the  two  makes  of  tires? 

b.  Test  whether  or  not  the  regression  functions  are  the  same  for  the  two  makes 
of  tires.  Control  the  risk  of  Type  I  error  at  .05.  State  the  alternatives,  deci¬ 
sion  rule,  and  conclusion. 

c.  Suppose  the  question  of  interest  simply  had  been  whether  the  two  regression 
lines  have  equal  slopes.  Answer  this  question  by  setting  up  a  95  percent 
confidence  interval  for  the  difference  between  the  two  slopes .  What  do  you 
find? 

d.  Prepare  a  residual  plot  for  each  make  of  tire.  Does  the  assumption  of  equal 
error  term  variances  appear  to  be  reasonable  here? 

10. 16.  Refer  to  Muscle  mass  Problem  2.23 .  The  nutritionist  conjectures  that  the  regres¬ 
sion  of  muscle  mass  on  age  follows  a  two-piece  linear  relation,  with  the  slope 
changing  at  age  60  without  discontinuity. 

a.  State  the  regression  model  that  applies  if  the  nutritionist’s  conjecture  is 
correct.  What  are  the  respective  response  functions  when  age  is  60  or  less 
and  when  age  is  over  60? 

b.  Fit  the  regression  model  specified  in  part  (a)  and  state  the  estimated  regres¬ 
sion  function. 

c.  Test  whether  a  two-piece  linear  regression  function  is  needed;  use  a  =  .01 . 
State  the  alternatives,  decision  rule,  and  conclusion.  What  is  the  P- value  of 
the  test? 

10.17.  Shipment  handling.  Global  Electronics  periodically  imports  shipments  of  a 
certain  large  part  used  as  a  component  in  several  of  its  products .  The  size  of  the 
shipment  varies  depending  upon  production  schedules.  For  handling  and  distri¬ 
bution  to  assembling  plants,  shipments  of  size  250  thousand  parts  or  less  are  sent 
to  warehouse  A;  larger  shipments  are  sent  to  warehouse  B  since  this  warehouse 
has  specialized  equipment  that  provides  greater  economies  of  scale  for  large 
shipments.  The  data  below  were  collected  on  the  10  most  recent  shipments;  X  is 
the  size  of  the  shipment  (in  thousand  parts),  and  Y  is  the  direct  cost  of  handling 
the  shipment  in  the  warehouse  (in  thousand  dollars). 

i:  1  2  3  4  5  6  7  8  9  10 

X,-:  225  350  150  200  175  180  325  290  400  125 

Yf.  11.95  14.13  8.93  10.98  10.03  10.13  13.75  13.30  15.00  7.97 

A  two-piecewise  linear  regression  model  with  a  possible  discontinuity  at  X  — 
250  is  to  be  fitted. 

a.  Specify  the  regression  model  to  be  used. 

b.  Fit  this  regression  model.  Plot  the  fitted  response  function  and  the  data.  Is 
there  any  indication  that  greater  economies  of  scale  are  obtained  in  handling 
relatively  large  shipments  than  relatively  small  ones? 

c.  Test  whether  or  not  both  the  two  separate  slopes  and  the  discontinuity  can  be 


372  /  Indicator  variables 


dropped  from  the  model.  Control  the  level  of  significance  at  .025.  State  the 
alternatives,  decision  rule,  and  conclusion, 
d.  For  relatively  small  shipments,  what  is  the  point  estimate  of  the  increase  in 
expected  handling  cost  for  each  increase  of  one  thousand  in  the  size  of  the 
shipment?  What  is  the  corresponding  estimate  for  relatively  large  ship¬ 
ments? 

10.18.  In  time  series  analysis,  the X  variable  representing  time  usually  is  defined  to  take 
on  values  1,  2,  etc.,  for  the  successive  time  periods.  Does  this  represent  an 
allocated  code  when  the  time  periods  are  actually  1979,  1980,  etc.? 

10.19.  An  analyst  wishes  to  include  number  of  older  siblings  in  family  as  an  independ¬ 
ent  variable  in  a  regression  analysis  of  factors  affecting  maturation  in  eighth 
graders.  The  number  of  older  siblings  in  the  sample  observations  ranges  from 
zero  to  four.  Discuss  whether  this  variable  should  be  placed  in  the  model  as  an 
ordinary  quantitative  variable  or  by  means  of  four  0,1  indicator  variables. 

10.20.  Refer  to  regression  model  (10.2)  for  the  insurance  innovation  study.  Suppose  X0 
were  dropped  to  eliminate  the  linear  dependence  in  the  X  matrix  so  that  the 
model  is  F(-  =  PiXn  +  fiYXn  +  /33X)3  +  £,.  What  is  the  meaning  here  of  the 
regression  coefficients  /3i,  /32,  and  /33? 

10.21.  Refer  to  Figure  10.10.  A  student  stated:  “The  least  squares  line  cannot  be  cor¬ 
rect  as  it  stands  because  it  is  obvious  that  when  Y  can  take  on  only  values  of  0 
and  1  the  least  squares  line  has  to  be  horizontal.”  Comment.  What  would  be  the 
implication  if  the  least  squares  line  were  horizontal? 

10.22.  Performance  ability.  A  psychologist  made  a  small-scale  study  to  examine  the 
nature  of  the  relation,  if  any,  between  an  employee’s  emotional  stability  (X)  and 
the  employee’s  ability  to  perform  in  a  task  group  (F).  Emotional  stability  was 
measured  by  a  written  test  and  ability  to  perform  in  a  task  group  (F  =  1  if  able, 
F  =  0  if  unable)  was  evaluated  by  the  supervisor.  The  results  were: 

i:  1  2  3  4  5  6  7  8 

X,\  474  619  584  638  399  481  624  582 

F:  0  1  0  1  0  1  1  1 

a.  Fit  a  linear  regression  function  by  ordinary  least  squares. 

b.  Now  use  the  two-stage  weighted  least  squares  procedure  to  fit  the  linear 
regression  function.  Was  use  of  weighted  least  squares  helpful  here  in  terms 
of  the  precision  of  the  regression  coefficients? 

c.  Is  there  any  evidence  available  from  this  small-scale  study  whether  or  not  a 
linear  response  function  is  appropriate  for  the  range  of  X  values  encoun¬ 
tered? 

10.23.  Annual  dues.  The  board  of  directors  of  a  professional  association  conducted  a 
random  sample  survey  of  30  members  to  assess  the  effects  of  several  possible 
amounts  of  dues  increase.  The  sample  results  follow.  X  denotes  the  dollar  in¬ 
crease  in  annual  dues  posited  in  the  survey  interview  and  F  =  1  if  the  inter¬ 
viewee  indicated  that  the  membership  will  not  be  renewed  at  that  amount  of  dues 
increase  and  0  if  the  membership  will  be  renewed. 

i:  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15 

X,\  30  30  30  31  32  33  34  35  35  35  36  37  38  39  40 

F;:  0  1  0  0  0  0  1  0  0  1  .  1  0  0  1  0 


Problems  /  373 


i:  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30 

Xt:  40  40  41  42  43  44  45  45  45  46  47  48  49  50  50 

Y,:  1  1  0  1  1  1  0  1  1  0  1  1  0  1  1 

a.  Fit  a  linear  regression  function  by  ordinary  least  squares.  State  the  estimated 
regression  function.  Does  the  estimated  regression  function  fall  below  0  or 
above  1  within  the  scope  of  the  fitted  model? 

b.  Construct  a  95  percent  confidence  interval  for  the  slope  of  the  response 
function.  Interpret  your  interval  estimate. 

c.  Estimate  the  mean  response  for  a  $40  dues  increase.  Use  a  95  percent  confi¬ 
dence  interval.  Interpret  your  interval  estimate. 

d.  Fit  a  linear  regression  function  by  the  two-stage  weighted  least  squares 
procedure.  Was  weighted  least  squares  helpful  here  in  terms  of  the  precision 
of  the  regression  coefficients?  Explain. 

10.24.  Bottle  return.  A  carefully  controlled  experiment  was  conducted  to  study  the 
effect  of  the  size  of  the  deposit  level  on  the  likelihood  that  a  returnable  one-liter 
soft-drink  bottle  will  be  returned.  A  bottle  return  was  scored  1 ,  and  no  return  was 
scored  0.  The  data  to  follow  show  the  number  of  bottles  that  were  returned  (Rj) 
out  of  500  sold  (tij)  at  each  of  six  deposit  levels  (Xj ,  in  cents): 

_ j:  1  2  3  4  5  6 

Deposit  level  X/.  2  5  10  20  25  30 

Number  sold  ny  500  500  500  500  500  500 

Number  returned  Ry  72  103  170  296  406  449 

An  analyst  believes  that  the  logistic  response  function  (10.53)  is  appropriate  for 
studying  the  relation  between  size  of  deposit  and  the  probability  a  bottle  will  be 
returned. 

a.  Apply  transformation  (10.57)  and  plot  the  transformed  proportions  against 
X.  Does  the  plot  support  the  use  of  the  transformed  model  (10.55)? 

b.  Fit  the  logistic  response  function  using  the  transformed  proportions  and 
weighted  least  squares,  and  state  the  fitted  response  function. 

c.  What  is  the  estimated  probability  that  a  bottle  will  be  returned  when  the 
deposit  is  10  cents?  When  the  deposit  is  30  cents? 

d.  Estimate  the  amount  of  deposit  for  which  75  percent  of  the  bottles  will  be 
returned. 

10.25.  Toxicity  experiment.  In  an  experiment  testing  the  effect  of  a  toxic  substance, 
1,500  experimental  insects  were  divided  at  random  into  six  groups  of  250  each. 
The  insects  in  each  group  were  exposed  to  a  fixed  dose  of  the  toxic  substance.  A 
day  later,  each  insect  was  observed.  Death  from  exposure  was  scored  1,  and 
survival  was  scored  0.  The  results  are  shown  below;  Xj  denotes  the  dose  level  (on 
a  logarithmic  scale)  received  by  the  insects  in  group  j  and  Rj  denotes  the  number 
of  insects  that  died  out  of  the  250  («7)  in  the  group. 

j:  1  2  3  4  5  6 

Xy  1  2  3  4  5  6 

Ry  28  53  93  126  172  197 

ny  250  250  250  250  250  250 


374  /  Indicator  variables 


a.  Fit  the  logistic  response  function  (10.53)  using  transformation  (10.57)  and 
weighted  least  squares.  State  the  fitted  response  function  in  the  original 
units. 

b.  Plot  the  fitted  response  function  in  the  original  units  and  the  original  data. 
Does  the  fit  appear  to  be  a  good  one? 

c.  What  is  the  estimated  median  dose — i.e. ,  the  dose  for  which  50  percent  of 
the  experimental  insects  would  be  expected  to  die? 


EXERCISES 

10.26.  Refer  to  the  instrument  calibration  study  Example  2  in  Section  10.4.  Suppose 
that  three  instruments  (A,  B,  C)  had  been  developed  to  identical  specifications, 
that  the  regression  functions  relating  gauge  reading  (F)  to  actual  pressure  (X, ) 
are  second-order  polynomials  for  each  instrument,  that  the  error  term  variances 
are  the  same,  and  that  the  polynomial  coefficients  may  differ  from  one  instru¬ 
ment  to  the  next.  Let  X3  denote  a  second  indicator  variable,  where  X3  =  1  if 
instrument  C  and  0  otherwise. 

a.  Expand  model  (10.21)  to  cover  this  situation. 

b.  State  the  alternatives,  define  the  test  statistic,  and  give  the  decision  rule  for 
each  of  the  following  tests  when  the  level  of  significance  is  .01:  (1)  test 
whether  the  second-order  regression  functions  for  the  three  instruments  are 
identical,  (2)  test  whether  all  three  regression  functions  have  the  same  inter¬ 
cept,  (3)  test  whether  both  the  linear  and  quadratic  effects  are  the  same  in  all 
three  regression  functions. 

10.27.  Refer  to  Muscle  mass  Problem  10. 16.  Specify  the  model  for  the  case  where  the 
slope  changes  at  age  40  and  again  at  age  60  with  no  discontinuities. 

10.28.  In  a  regression  study,  three  types  of  banks  were  involved,  namely,  commercial, 
mutual  savings,  and  savings  and  loan.  Consider  the  following  system  of  indica¬ 
tor  variables  for  type  of  bank: 


Type  of  Bank  X2  X3 


Commercial  1  0 

Mutual  savings  0  1 

Savings  and  loan  —  1  —  1 


a.  Develop  a  first-order  linear  regression  model  for  relating  last  year’s  profit  or 
loss  (F)  to  size  of  bank  (Xt)  and  type  of  bank  (X2,  X3). 

b.  State  the  response  functions  for  the  three  types  of  banks. 

c.  Interpret  each  of  the  following  quantities:  (1)  (32,  (2)  /33 ,  (3)  —  fi2  —  jS3. 

10.29.  Refer  to  regression  model  (10.17). 

a.  Obtain  the  X'X  matrix  for  this  special  case  of  a  single  qualitative  independ¬ 
ent  variable,  for  i  =  1 

b.  Using  (7.21),  find  b. 

c.  Using  (7.26)  and  (7.27),  find  SSR  and  SSE. 


Projects  /  375 


PROJECTS 

10.30.  Refer  to  the  SMSA  data  set.  The  number  of  active  physicians  (7)  is  to  be  re¬ 
gressed  against  total  population  (Xi),  total  personal  income  (X2),  and  geographic 
region. 

a.  Fit  a  first-order  regression  model.  Let  X3  =  1  if  NE  and  0  otherwise,  X4  =  1 
if  NC  and  0  otherwise,  and  X5  —  1  if  S  and  0  otherwise. 

b.  Examine  whether  the  effect  for  the  Northeastern  region  on  number  of  active 
physicians  differs  from  the  effect  for  the  North  Central  region  by  construct¬ 
ing  an  appropriate  90  percent  confidence  interval.  Interpret  your  interval 
estimate. 

c.  Test  whether  any  geographic  effects  are  present;  use  a  =  .10.  State  the 
alternatives,  decision  rule,  and  conclusion.  What  is  the  P-value  of  the  test? 

10.31.  Refer  to  the  SENIC  data  set.  Infection  risk  (F)  is  to  be  regressed  against  length  of 
stay  (Xi),  age  (X2),  routine  chest  X-ray  ratio  (X3),  and  medical  school  affiliation 
(X4). 

a.  Fit  a  first-order  regression  model.  Let  X4  =  1  if  hospital  has  medical  school 
affiliation  and  0  if  not. 

b.  Estimate  the  effect  of  medical  school  affiliation  on  infection  risk  using  a  98 
percent  confidence  interval.  Interpret  your  interval  estimate. 

c.  It  has  been  suggested  that  the  effect  of  medical  school  affiliation  on  infec¬ 
tion  risk  may  interact  with  the  effects  of  age  and  routine  chest  X-ray  ratio. 
Add  appropriate  interaction  terms  to  the  model,  fit  the  revised  regression 
model,  and  test  whether  the  interaction  terms  are  helpful;  use  a  —  .10.  State 
the  alternatives,  decision  rule,  and  conclusion. 

10.32.  Refer  to  the  SENIC  data  set.  Length  of  stay  (Y)  is  to  be  regressed  on  age  (X, ) , 
routine  culturing  ratio  (X2),  average  daily  census  (X3),  available  facilities  and 
services  (X4),  and  region  (X5,  X$,  X7). 

a.  Fit  a  first-order  regression  model.  Let  X5  =  1  if  NE  and  0  otherwise,  X^  =  1 
if  NC  and  0  otherwise,  and  X7  =  1  if  S  and  0  otherwise. 

b.  Test  whether  the  routine  culturing  ratio  can  be  dropped  from  the  model;  use 
a  level  of  significance  of  .05.  State  the  alternatives,  decision  rule,  and 
conclusion. 

c.  Examine  whether  the  effect  on  length  of  stay  for  hospitals  located  in  the 
Western  region  differs  from  that  for  hospitals  located  in  the  other  three 
regions  by  constructing  an  appropriate  confidence  interval  for  each  pairwise 
comparison.  Use  the  Bonferroni  procedure  with  a  95  percent  family  confi¬ 
dence  coefficient.  Summarize  your  findings. 

10.33.  Refer  to  the  SENIC  data  set.  A  public  health  official  has  requested  a  study  of 
whether  medical  school  affiliation  (F)  is  related  to  available  facilities  and  serv¬ 
ices  (X).  Let  F  =  1  if  the  hospital  has  a  medical  school  affiliation  and  0  if  not. 

a.  Fit  a  linear  response  function,  using  ordinary  least  squares. 

b.  Plot  the  data  and  the  fitted  regression  line.  Is  the  regression  line  consistent 
with  probability  characteristics  (i.e.,  is  it  between  0  and  1)  within  the  scope 
of  the  observations? 


376  /  Indicator  variables 


CITED  REFERENCES 

10.1  Cox,  D.  R.  The  Analysis  of  Binary  Data.  London:  Methuen  &  Co.  Ltd.,  1970. 

10.2  Finney,  D.  J.  Probit  Analysis.  3d  ed.  Cambridge:  Cambridge  University  Press, 
1971. 


11 


Multicollinearity,  influential 
observations,  and  other  topics 
in  regression  analysis — II 


In  this  chapter,  we  take  up  selected  topics  in  regression  analysis  dealing  with 
reparameterization  to  improve  computational  accuracy,  multicollinearity,  and 
detection  of  influential  observations. 

11.1  REPARAMETERIZATION  TO  IMPROVE  COMPUTATIONAL 
ACCURACY 

Roundoff  errors  in  least  squares  calculations 

Least  squares  results  can  be  sensitive  to  rounding  of  data  in  intermediate 
stages  of  calculations.  When  the  number  of  independent  variables  is  small — say, 
three  or  less — roundoff  effects  can  be  controlled  by  carrying  a  sufficient  number 
of  digits  in  intermediate  calculations.  Indeed,  most  computer  regression  pro¬ 
grams  use  double  precision  arithmetic  (e.g.,  use  of  16  digits  instead  of  8  digits) 
in  all  computations  to  control  roundoff  effects.  Still,  with  a  large  number  of 
independent  variables,  serious  roundoff  effects  can  arise  despite  the  use  of  many 
digits  in  intermediate  calculations. 

Roundoff  errors  tend  to  enter  into  least  squares  calculations  primarily  when 
the  inverse  of  X'X  is  taken.  Of  course,  any  errors  in  (X'X)-1  may  be  magnified 
when  calculating  b  or  making  other  subsequent  calculations.  The  danger  of  seri- 

377 


378  /  Multicollinearity,  influential  observations,  and  other  topics — II 

ous  roundoff  errors  in  (X'X)~ 1  is  particularly  great  when  (1)  X'X  has  a  determi¬ 
nant  which  is  close  to  zero  and/or  (2)  the  elements  of  X'X  differ  substantially  in 
order  of  magnitude.  The  first  condition  arises  when  some  or  all  of  the  independ¬ 
ent  variables  are  highly  intercorrelated.  Remedial  measures  for  this  situation  will 
be  discussed  in  Section  11.4.  One  such  measure  is  to  discard  one  or  several  of  the 
intercorrelated  independent  variables  in  an  effort  to  shift  the  determinant  away 
from  near  zero  so  that  roundoff  errors  will  not  have  as  severe  an  effect. 

The  second  condition  arises  when  the  variables  have  substantially  different 
magnitudes  so  that  the  entries  in  the  X'X  matrix  cover  a  wide  range,  say,  from  15 
to  49,000,000.  A  solution  for  this  condition  is  to  transform  the  variables  and 
thereby  reparameterize  the  regression  model.  We  now  address  this  second  prob¬ 
lem.  The  transformation  which  we  take  up  is  called  the  correlation  transforma¬ 
tion.  It  makes  all  entries  in  the  X'X  matrix  for  the  transformed  variables  fall 
between  —1  and  +1  inclusive,  so  that  the  calculation  of  the  inverse  matrix 
becomes  much  less  subject  to  roundoff  errors  due  to  dissimilar  orders  of  magni¬ 
tudes  than  with  the  original  variables.  Many  computer  regression  packages  auto¬ 
matically  use  this  transformation  to  obtain  the  basic  regression  results  and  then 
retransform  to  the  original  variables.  In  any  case,  users  of  computer  programs  are 
well  advised  to  check  that  the  regression  package  to  be  employed  makes  appro¬ 
priate  provisions  to  prevent  roundoff  errors  from  getting  out  of  hand. 


Correlation  transformation 

We  shall  illustrate  the  correlation  transformation  for  the  case  of  two  independ¬ 
ent  variables.  The  basic  regression  model  which  we  shall  assume  is  the  usual 
first-order  one: 

(11.1)  Yi  =  A)  +  PiXn  +  p2xi2  +  et 

The  first  step  in  order  to  make  the  entries  in  the  X'X  matrix  of  similar  magni¬ 
tude  is  to  express  all  observations  on  the  independent  variables  as  deviations 
from  their  respective  means.  To  illustrate  this  for  n  =  2,  if: 

Xn  =  3,  X21  =  5  and  X12  =  1,006,  X22  =  1,010 

the  deviations  would  be: 

Xn  -Xi  =  -1,X21  -Xx  =  1  and  X12  -  X2  =  -2,  X22  -  X2  =  2 

Note  that  the  deviations  are  of  similar  magnitude  whereas  the  original  observa¬ 
tions  were  not. 

To  use  the  deviations  Xn  —  Xy  and  Xi2  —  X2,  model  (11.1)  must  be  modified 
by  adding  and  subtracting  the  same  terms: 

Yt  =  (A)  +  PiXi  +  (hX2 )  +  Pi(Xn  ~  X,)  +  (32(Xi2  -  X2)  +  £i 


or: 

(11.2)  Yt  =  j80'  +  Pi (Xn  -X,)  +  [32(Xi2  -  X2)  +  ef 


11.1  Reparameterization  to  improve  computational  accuracy  /  379 


where: 

(H.2a)  fl,'  =  A,  +  PiXi  +  fcX2 

It  can  be  shown  that  the  least  squares  estimator  of  /3<j  is  always  7.  Hence,  we 
can  rewrite  (11.2)  as  follows: 

(11.3)  Yi-Y  =  PdXn  ~  X,)  +  P2(XI2  -  X2)  +  Sj 

While  it  might  appear  that  by  eliminating  one  parameter  from  the  model  we  have 
been  able  to  increase  the  degrees  of  freedom  available  for  MSE  by  one,  this  is  not 
so  since  the  dependent  variable  observations  Yt  —  Y  are  now  subject  to  the  re¬ 
striction  2(7)  —  7)  =  0. 

The  second  step  in  developing  the  correlation  transformation  is  to  express 
each  deviation  variable  in  units  of  its  standard  deviation: 


(11.4) 


Yj-Y  Xn  -  Xx  Xi2  -  X2 

Sy  Si  S2 


where  sY,  si,  and  s2  are  the  respective  standard  deviations  of  7,  X\ ,  and  X2 
defined  as  follows: 


(11.5a) 

(11.5b) 

(11.5c) 


Sy  - 


S 1  = 


S2  = 


1 - 

-  Yf 

n  — 

1 

1  2(X,-i 

-Xi)2 

n  - 

-  1 

'  Z(X,2 

-X2)2 

n  —  1 


The  final  step  in  obtaining  the  correlation  transformation  is  to  use  the  follow¬ 
ing  function  of  the  standardized  variables  in  (11.4): 


(11.6a) 

(11.6b) 

(11.6c) 


71  = 


W'i  = 


1 


7,  -  7 


XI 


i  2 


Vn  —  1 

Sy  ) 

1 

(X, 

I-Xi 

Vn  —  1 

l 

Si 

1 

(h 

2  —  X2 

Vn  —  l 

S2 

Reparameterized  model 

The  regression  model  with  the  transformed  variables  7',  X[,  and  X’2  as  de¬ 
fined  by  the  correlation  transformation  in  (11.6)  is  a  simple  extension  of  model 

(11.3): 


(11.7) 


7'  =  fifth  +  mk  +  si 


380  /  Multicollinearity,  influential  observations,  and  other  topics — II 


It  is  easy  to  show  that  the  new  parameters  /3[  and  /3 2  in  (11.7)  and  the  original 
parameters  /30,  /3l5  and  (32  in  (11.1)  are  related  as  follows: 


(11.8a) 

Pi  = 

— k 

\Si  / 

/  \ 

(11.8b) 

f  FV2' 

\S2) 

(11.8c) 

Po  = 

Y  -  frXi  -  P2X2 

Thus,  the  new  regression  coefficients  f3{  and  j32  and  the  original  regression 
coefficients  fiy  and  (32  are  related  by  simple  scaling  factors  involving  ratios  of 
standard  deviations. 


X  X  matrix  for  transformed  variables 

The  X  matrix  for  the  transformed  variables  in  model  (11.7)  is: 


X 


X'n 

X\2 

X2i 

X22 

X'm 

X'n2 

Remember  that  model  (11.7)  does  not  contain  an  intercept  term;  hence,  there  is 
no  column  of  l’s  in  the  X  matrix.  The  X'X  matrix  then  is: 


(11.9) 


X'X 


X'n 

Xh  ■ 

■■  X'nl 

X'n 

X[2 

X[2 

X22  • 

■  •  x;2J 

X'n 

x’22 

X'm 

X'n2 

sa.'i)2  sxfixfc 
_zx'l2x;,  S(x;2f_ 

Let  us  now  consider  the  elements  of  this  X'X  matrix.  First,  we  have: 

=  yf_Xi-X,  V  2(X,-1-X„)2 


2(X',)2  =  X 


\  Vn  —  1 


si 


-  C2 

■  Si 


n 


Similarly: 

Finally: 


s(x;2)2  =  i 


-J  Xn~Xi  \(  Xi2-X2 

2 XllXt 2 "  \  v^T 


S2 


11.1  Reparameterization  to  improve  computational  accuracy  /  381 


1  2(X„  ~  XJjXn  ~  X2) 

n  —  1 

S(*«i  -  XMXn  ~  X2) 

&{Xn  -  -  X2f]m 


But  this  equals  r12,  the  coefficient  of  correlation  between  Xy  and  X2  by  (3.73). 
Since  XX’nX'2  =  EX'^X'n,  we  find  that  the  X'X  matrix  for  the  transformed 
variables,  denoted  by  rxx,  is: 


(11.10) 


rxx  =  X'X 


1  r  12 

r  12  1  ' 


By  (3.72),  rl2  must  fall  between  —1  and  +1  inclusive.  Hence,  it  follows  that 
the  elements  of  the  rxx  matrix  must  have  values  between  —  1  and  + 1  inclusive. 
While  our  example  dealt  with  the  case  of  two  independent  variables,  the  same 
result  follows  for  any  number  of  independent  variables  transformed  according  to 
the  principle  of  (11.6). 

The  matrix: 


r  i2 
1 

is  called  the  correlation  matrix  of  the  independent  variables.  For  p  —  1  inde¬ 
pendent  variables,  rxx  is  a  (p  —  1)  X  (p  —  1)  matrix  containing  l’s  on  the  main 
diagonal  and  the  correlation  coefficients  r,y  off  the  main  diagonal. 


rxx 


1 

r  12 


Regression  calculations 


The  least  squares  estimators  b[  and  b2  for  the  transformed  model  (11.7)  are 
obtained  in  the  usual  fashion.  The  inverse  of  the  rxx  matrix  is: 


(11.11) 


rxx  — 


1 


1 

-rn 


■r\2 

1 


1  “  r\2 

It  is  easy  to  show  that  the  X'Y  matrix  for  the  transformed  variables  is: 


(11.12) 


X'Y  = 


rY  l 

rY2 


where  rY\  and  rY2  are  the  coefficients  of  correlation  between  Y  and  Xx  and  be¬ 
tween  Y  and  X2,  respectively.  Hence,  the  estimated  regression  coefficients  for 
the  reparameterized  model  (11.7)  are  by  (7.21): 


(11.13)  b  = 


1 


1 


t~12 


1 

-ri2 


-r  12 
1 


rY\ 

1 

rY  l  ~ 

f"l2rY2 

rY2_ 

1  -  r\2 

JY2  ~ 

r\2rY\_ 

The  return  to  the  estimated  regression  coefficients  for  the  original  model  is 
accomplished  by  employing  the  relations  in  (11.8): 


b  i  = 


SY 

Si 


Wi 


(11.14a) 


382  /  Multicollinearity,  influential  observations,  and  other  topics — II 


(11.14b)  i2  =  (-^V 

(11.14c)  b„  =  Y  -  b,X,  -  b2X2 

Comments 

1 .  Some  computer  packages  present  both  the  regression  coefficients  bk  for  the  original 
model  as  well  as  the  coefficients  bk  for  the  transformed  model.  The  latter  are  sometimes 
labeled  beta  coefficients  in  printouts. 

2.  The  regression  coefficients  bk  for  the  reparameterized  model  are  the  same  as  the 
standardized  regression  coefficients  Bk  discussed  in  Section  7.9,  as  a  comparison  of 
(7.69)  with  (11.14)  makes  clear.  Thus,  use  of  the  transformed  variables  in  (11.6)  auto¬ 
matically  leads  to  the  standardized  regression  coefficients. 

3.  Some  computer  printouts  show  the  magnitude  of  the  determinant  of  the  correlation 
matrix  of  the  independent  variables.  A  near-zero  value  for  this  determinant  implies  both  a 
high  degree  of  linear  association  among  the  independent  variables  and  a  high  potential  for 
roundoff  errors.  For  the  case  of  two  independent  variables,  this  determinant  is  seen  to  be 
1  —  r  12,  which  approaches  0  as  r\2  approaches  1. 

4.  When  the  correlation  matrix  of  the  independent  variables  is  augmented  by  a  row 
and  column  for  Y,  it  is  called  the  correlation  matrix.  A  correlation  matrix  shows  the 
coefficients  of  correlation  for  all  pairs  of  dependent  and  independent  variables.  This 
information  is  useful  in  a  variety  of  tasks — for  instance,  in  selecting  the  final  independent 
variables  to  be  included  in  the  model.  Many  computer  programs  display  the  correlation 
matrix  in  the  printout. 

For  the  case  of  two  independent  variables,  the  correlation  matrix  is  as  follows: 

1  rYl  rY2 

rYl  1  r\2 

r Y2  r  12  1  _ 

Since  the  correlation  matrix  is  symmetric,  the  lower  (or  upper)  triangular  block  of  ele¬ 
ments  is  frequently  omitted  in  computer  printouts. 

5.  It  is  possible  to  use  the  correlation  transformation  with  a  computer  package  that 
does  not  permit  regression  through  the  origin,  because  the  intercept  coefficient  bo  will 
always  be  zero  for  data  so  transformed.  The  other  regression  coefficients  will  also  be 
correct,  as  will  be  all  sums  of  squares.  The  degrees  of  freedom  and  mean  squares  will  not 
all  be  correct,  however,  and  will  need  to  be  corrected  for  the  regression  through  the 
origin. 

11.2  PROBLEMS  OF  MULTICOLLINEARITY 

When  we  discussed  multiple  regression  in  Chapter  8,  we  noted  some  key 
problems  that  typically  arise  when  the  independent  variables  which  are  being 
considered  for  the  model  are  highly  correlated  among  themselves: 

1 .  Adding  or  deleting  an  independent  variable  changes  the  regression  coeffi¬ 
cients. 

2.  The  extra  sum  of  squares  associated  with  an  independent  variable  varies, 


11.2  Problems  of  multicollinearity  /  383 


depending  upon  which  independent  variables  already  are  included  in  the 
model. 

3.  The  estimated  regression  coefficients  individually  may  not  be  statistically 
significant  even  though  a  definite  statistical  relation  exists  between  the  de¬ 
pendent  variable  and  the  set  of  independent  variables. 

These  problems  can  also  arise  without  substantial  multicollinearity  being  pres¬ 
ent,  but  only  under  unusual  circumstances  not  likely  to  be  found  in  practice. 

We  shall  now  expand  on  the  topic  of  multicollinearity  because  high  inter¬ 
correlations  among  independent  variables  are  frequently  found  in  nonexperimen- 
tal  data  in  management  and  the  social  and  biological  sciences.  For  example,  such 
pairs  of  independent  variables  as  family  income  and  liquid  assets  and  store  sales 
and  number  of  employees  would  tend  to  be  correlated  highly. 

Nature  of  problem 

To  see  the  essential  nature  of  the  problem  of  multicollinearity,  we  shall  em¬ 
ploy  a  simple  example.  The  data  in  Table  11.1  refer  to  four  sample  observations 
on  a  dependent  variable  and  two  independent  variables.  Mr.  A  was  asked  to  fit 
the  multiple  regression  model: 

(11.15)  E(Y)  =  fa  +  frX,  +  (32X2 
He  returned  in  a  short  time  with  the  fitted  model: 

(11.16)  Y  =  -87  +  Xx  +  18A2 

He  was  proud  of  this  model  because  it  fit  the  data  perfectly.  The  fitted  values  are 
shown  in  Table  11.1. 

It  so  happened  that  Ms.  B  also  was  asked  to  fit  model  (11.15)  to  the  same 
data,  and  she  arrived  at  the  fitted  model: 

(11.17)  Y  =  —7  +  9X1  -I-  2X2 
Again,  this  model  fits  perfectly,  as  shown  in  Table  11.1. 


TABLE  11.1  Example  of  perfectly  correlated  independent  variables 


Observation 

i 

Xn 

Xt2 

y, 

Fitted  Values 

Model  (11.16) 

Model  (11.17) 

i 

2 

6 

23 

23 

23 

2 

8 

9 

83 

83 

83 

3 

6 

8 

63 

63 

63 

4 

10 

10 

103 

103 

103 

1 

Model  (11.16): 

Y=  -87  +Xj 

+  18X2 

Model  (11.17): 

Y  =  -7  +  9Xy 

+  2X2 

384  /  Multicollinearity,  influential  observations,  and  other  topics — II 


Indeed,  it  can  be  shown  that  infinitely  many  models  will  fit  the  data  in  Table 
11.1  perfectly.  The  reason  is  that  the  independent  variables  X}  and  X2  are  per¬ 
fectly  related,  according  to  the  relation: 

(11.18)  X2  =  5  +  .5Xi 

Note  carefully  that  fitted  models  (11.16)  and  (11.17)  are  entirely  different 
response  surfaces.  The  regression  coefficients  are  different,  and  the  fitted  values 
will  differ  when  X{  and  X2  do  not  follow  relation  (11.18).  For  example,  the  fitted 
value  for  model  (11.16)  when  Xi  =  5  and  X2  =  5  is: 

Y=  -87  +  5  +  18(5)  =  8 
while  the  fitted  value  for  model  (11.17)  is: 

Y=  -1  +  9(5)  +  2(5)  =  48 

Thus,  whenXi  and  X2  are  perfectly  related  and,  as  in  our  example,  the  data  do 
not  contain  any  random  error  component,  many  different  response  functions  will 
lead  to  the  same  perfectly  fitted  values  for  the  observations  and  to  the  same  fitted 
values  for  any  other  (Xi ,  X2)  combinations  following  the  relation  between  Xx  and 
X2 .  Yet  these  response  functions  are  not  the  same  and  will  lead  to  different  fitted 
values  for  (X1?  X2)  combinations  that  do  not  follow  the  relation  between  X}  and 

x2. 

Two  key  implications  of  this  example  are: 

1 .  The  perfect  relation  between  X,  and  X2  did  not  inhibit  our  ability  to  obtain  a 
good  fit  to  the  data. 

2.  Since  many  different  models  provide  the  same  good  fit,  one  cannot  interpret 
any  one  set  of  regression  coefficients  as  reflecting  the  effects  of  the  different 
independent  variables.  Thus,  in  fitted  model  (11.16 ),  bi  =  1  and  b2  =  18  do 
not  imply  that  X2  is  the  key  independent  variable  and  X\  plays  little  role, 
because  model  (11 . 17)  provides  an  equally  good  fit  and  its  regression  coeffi¬ 
cients  have  opposite  comparative  magnitudes. 


Effects  of  multicollinearity 

In  actual  practice,  we  seldom  find  independent  variables  that  are  perfectly 
related  or  data  that  do  not  contain  some  random  error  component.  Nevertheless, 
the  implications  just  noted  for  our  idealized  example  still  have  relevance. 

1 .  The  fact  that  some  or  all  independent  variables  are  correlated  among  them¬ 
selves  does  not,  in  general,  inhibit  our  ability  to  obtain  a  good  fit  nor  does  it  tend 
to  affect  inferences  about  mean  responses  or  predictions  of  new  observations, 
provided  these  inferences  are  made  within  the  region  of  observations.  (Figure 
7.10  on  p.  261  provides  an  illustration  of  the  concept  of  the  region  of  observa¬ 
tions  for  the  case  of  two  independent  variables.) 

2.  The  counterpart  in  real  life  to  the  many  different  regression  functions  pro¬ 
viding  equally  good  fits  to  the  data  in  our  idealized  example  is  that  the  estimated 
regression  coefficients  tend  to  have  large  sampling  variability  when  the  inde- 


1 1 .2  Problems  of  multicollinearity  /  385 


pendent  variables  are  highly  correlated.  Thus,  the  estimated  regression  coeffi¬ 
cients  tend  to  vary  widely  from  one  sample  to  the  next  when  the  independent 
variables  are  highly  correlated.  As  a  result,  only  imprecise  information  may  be 
available  about  the  individual  true  regression  coefficients.  Indeed,  each  of  the 
estimated  regression  coefficients  individually  may  be  statistically  not  significant 
even  though  a  definite  statistical  relation  exists  between  the  dependent  variable 
and  the  set  of  independent  variables. 

3.  The  common  interpretation  of  regression  coefficients  as  measuring  the 
change  in  the  expected  value  of  the  dependent  variable  when  the  corresponding 
independent  variable  is  increased  by  one  unit  while  all  other  independent  varia¬ 
bles  are  held  constant  is  not  fully  applicable  when  multicollinearity  exists.  While 
it  may  be  conceptually  possible  to  vary  one  independent  variable  and  hold  the 
others  constant,  it  may  not  be  possible  in  practice  to  do  so  for  independent 
variables  that  are  highly  correlated.  For  example,  in  a  regression  model  for  pre¬ 
dicting  crop  yield  from  amount  of  rainfall  and  hours  of  sunshine,  the  relation 
between  the  two  independent  variables  makes  it  unrealistic  to  consider  varying 
one  while  holding  the  other  constant.  Therefore,  the  simple  interpretation  of  the 
regression  coefficients  as  measuring  marginal  effects  is  often  unwarranted  with 
highly  correlated  independent  variables. 

Example.  To  illustrate  these  basic  points,  consider  the  data  in  Table  11.2  for 
the  body  fat  example  which  was  discussed  earlier  in  Chapter  8.  We  shall  now 


TABLE  11.2  Body  fat  example  with  three  independent  variables,  two  of  which 
are  highly  correlated 


Subject 

i 

Triceps 

Skinfold  Thickness 

Xn 

Thigh 

Circumference 

Xn 

Midarm 

Circumference 

Xi3 

Body  Fat 
Yi 

i 

19.5 

43.1 

29.1 

11.9 

2 

24.7 

49.8 

28.2 

22.8 

3 

30.7 

51.9 

37.0 

18.7 

4 

29.8 

54.3 

31.1 

20.1 

5 

19.1 

42.2 

30.9 

12.9 

6 

25.6 

53.9 

23.7 

21.7 

7 

31.4 

58.5 

27.6 

27.1 

8 

27.9 

52.1 

30.6 

25.4 

9 

22.1 

49.9 

23.2 

21.3 

10 

25.5 

53.5 

24.8 

19.3 

11 

31.1 

56.6 

30.0 

25.4 

12 

30.4 

56.7 

28.3 

27.2 

13 

18.7 

46.5 

23.0 

11.7 

14 

19.7 

44.2 

28.6 

17.8 

15 

1*1.6 

42.7 

21.3 

12.8 

16 

29.5 

54.4 

30.1 

23.9 

17 

27.7 

55.3 

25.7 

22.6 

18 

30.2 

58.6 

24.6 

25.4 

19 

22.7 

48.2 

27.1 

14.8 

20 

25.2 

51.0 

27.5 

21.1 

386  /  Multicollinearity,  influential  observations,  and  other  topics — II 


consider  also  a  third  independent  variable — X3 ,  the  midarm  circumference — in 
addition  to  triceps  skinfold  thickness  (XO  and  thigh  circumference  (X2),  the  two 
independent  variables  previously  considered. 

Suppose  that  we  regress  body  fat  (7)  on  triceps  skinfold  thickness  (Xj)  only. 
The  results  of  a  least  squares  fit  of  the  response  function: 

(11.19)  E(Y)  =  p  o  +  PiXi 

are  shown  in  Table  11.3a.  The  coefficient  of  determination  r\j  (the  notation 
shows  that  the  relation  between  Y  and  Xx  is  being  considered)  is: 

,  SSR(Xx)  352.27 

rh  = - = - =  .711 

SSTO  495.39 


which  indicates  that  the  variability  of  Y  is  reduced  by  7 1  percent  by  considering 
independent  variable  X^  Also  note  that  s(bx)  is  relatively  small: 


s(b  i) 
b\ 


.1288 


.8572 


Let  us  now  add  independent  variable  X2  to  the  model.  This  variable  is  highly 
correlated  with  Xx .  The  coefficient  of  determination  between  the  two  independ¬ 
ent  variables  X]  and  X2,  denoted  by  r j2,  is  rj2  =  .853.  The  results  of  fitting  the 
response  function: 

(11.20)  E(Y)  =  p0  +  fi1X1  +  p2X2 

are  shown  in  Table  11.3b.  Note  the  following: 

1 .  The  fit  of  model  (11.20)  has  not  been  made  worse  in  the  sense  of  a  higher 
error  sum  of  squares  SSE,  despite  the  introduction  of  a  highly  correlated  inde¬ 
pendent  variable.  Indeed,  we  noted  earlier  that  SSE  can  never  increase  as  the 
result  of  introducing  another  independent  variable.  (MSE  can  increase  if  the 
reduction  in  SSE  is  not  adequate  to  compensate  for  the  loss  of  one  degree  of 
freedom.)  For  our  example,  the  coefficient  of  multiple  determination  is: 

9  SSR(Xx,  X2)  385.44 

R2  =  -  L’  = - =  .778 

SSTO  495.39 


indicating  that  the  variability  of  Y  is  reduced  by  78  percent  when  both  Xx  and  X2 
are  considered.  Further,  MSE  =  6.47  now,  as  compared  with  MSE  =  7.95  when 
only  Xi  is  included  in  the  model. 

2.  The  estimate  of  the  regression  coefficient  (3X  for  model  (11.20)  has  larger 
sampling  variability  than  before;  the  sampling  variation  of  bx  now  is  s(b 0  = 
.3034  as  compared  to  .1288  when  only  Xx  is  included  in  the  model.  Also,  the 
relative  sampling  variation  of  b2  is  quite  large: 


s(b2) 

b2 


.2912 


=  .4416 


.6594 


1 1 .2  Problems  of  multicollinearity  /  387 


TABLE  11.3  Regression  results  for  body  fat  example 


(a)  Regression  of  Y  on  X] 
Y  =  -1.496  +  .8572*! 


Source  of 
Variation 

SS 

df 

MS 

Regression 

352.27 

1 

352.27 

Error 

143.12 

18 

7.95 

Total 

495.39 

19 

Estimated 

Variable  Regression  Coefficient 


Estimated 
Standard  Deviation 


b\  =  .8572  s{bf)  =  .1288 

(b)  Regression  of  Y  on  X1  and  X2 


Y=  -19.174  +  .2224^ 
Source  of 

+  .6594X2 

Variation 

SS 

df  MS 

Regression 

385.44 

2  192.72 

Error 

109.95 

17  6.47 

Total 

495.39 

19 

Variable 


Estimated 


Estimated 


Regression  Coefficient  Standard  Deviation 


(c)  Regression  of  Y  on  Xx ,  X2,  and  X3 


'=  117.08  +  4.334X,  - 

2.857X2 

-  2.186X; 

Source  of 

Variation 

SS 

df 

MS 

Regression 

396.98 

3 

132.33 

Error 

98.41 

16 

6.15 

Total 

495.39 

19 

Estimated 


Estimated 


Variable  Regression  Coefficient  Standard  Deviation 


4.334 

-2.857 

-2.186 


3.016 

2.582 

1.596 


388  /  Multicollinearity,  influential  observations,  and  other  topics — II 


Indeed,  separate  tests  of  /3X  and  /32,  each  at  the  level  of  significance  .01, 
would  lead  to  the  conclusion  that  /3X  =  0  and  /32  =  0,  whereas  a  test  of  the  entire 
regression  relation,  based  on: 

MSR(XuX2 )  192.72 

p*  :  - — _  = - =  29  787 

MSE(XuX2)  6.47 

would  lead  to  the  conclusion,  for  level  of  significance  .01,  that  a  regression 
relation  does  exist. 

Let  us  finally  add  independent  variable  X3  to  the  model.  This  variable  is  not 
highly  correlated  with  either  of  the  other  two  independent  variables,  the  coeffi¬ 
cients  of  determination  between  X3  and  the  other  two  independent  variables 
being  rX3  =  .210  and  r23  =  .007,  respectively.  The  results  of  fitting  the  response 
function: 

(11.21)  E{Y)  =  ft,  +  PiX,  +  p2X2  +  P3X3 

are  shown  in  Table  11.3c.  Note  the  following: 

1 .  The  fit  of  the  model  has  been  improved  somewhat  further,  the  coefficient 
of  multiple  determination  being: 

9  SSR(XuX2,X3)  396.98 

R2  = -  2  = - =  .801 

SSTO  495.39 

as  compared  to  R2  —  .778  for  model  (11.20).  Further,  MSEiX x ,  X2,  X3)  —  6. 15 
now,  as  compared  with  MSE(X1 ,  X2)  =  6.47  for  the  model  with  Xx  and  X2  only. 

2.  The  estimate  of  the  regression  coefficient  /32  has  actually  changed  signs 
(the  estimate  changed  from  .6594  to  —2.857).  In  addition,  the  sampling  varia¬ 
tions  of  b\  and  b2  in  model  (11.21)  both  increased  dramatically;  s(b X)  =  3.016 
now  whereas  it  was  .3034  for  model  (11.20),  and  s(b2)  —  2.582  now  whereas 
before  it  was  .2912.  Again  the  high  degree  of  multicollinearity  among  the  inde¬ 
pendent  variables  Xx  and  X2  is  responsible  for  the  inflated  variability  of  the 
estimates  of  the  regression  coefficients.  Separate  tests  of  /3X ,  /32,  and  /33,  each  at 
the  level  of  significance  .01,  would  lead  to  the  conclusion  that  /3X  =  0,  /32  =  0, 
and  /33  =  0,  whereas  a  test  for  the  entire  regression  relation,  based  on: 

MSR(XuX2,X3)  132.33 

F*  = -  2’  32  = - =  21.517 

MSE(XuX2,X3 )  6.15 

would  lead  to  the  conclusion,  at  the  level  of  significance  .01,  that  a  regression 
relation  does  exist. 

3.  To  the  extent  that  a  change  in  thigh  circumference  (X2)  is  almost  always 
accompanied  by  a  corresponding  change  in  triceps  skinfold  thickness  (Zx),  the 
usefulness  of  the  measures  /3X  and  /32  is  diminished  because  they  reflect  the  effect 
of  a  change  in  one  variable  with  no  change  in  the  other. 

Comments 

1.  It  was  noted  in  Section  11.1  that  a  near- zero  determinant  of  X'X  is  a  potential 
source  of  serious  roundoff  errors  in  least  squares  results.  Severe  multicollinearity  has  the 


11.2  Problems  of  multicollinearity  /  389 


effect  of  making  this  determinant  come  close  to  zero.  Thus,  under  severe  multicollinear¬ 
ity,  the  regression  coefficients  may  be  subject  to  large  roundoff  errors  as  well  as  large 
sampling  variances .  Hence,  it  is  particularly  advisable  to  employ  the  correlation  transforma¬ 
tion  (11.6)  when  multicollinearity  is  present. 

2.  Just  as  high  intercorrelations  between  the  independent  variables  tend  to  make  the 
estimated  regression  coefficients  imprecise  (i.e. ,  erratic  from  sample  to  sample),  so  do  the 
coefficients  of  partial  correlation  between  the  dependent  variable  and  each  of  the  inde¬ 
pendent  variables  tend  to  become  erratic  from  sample  to  sample  when  the  independent 
variables  are  highly  correlated. 

3.  The  effect  of  intercorrelations  between  the  independent  variables  on  the  standard 
deviations  of  the  estimated  regression  coefficients  can  be  seen  readily  when  the  variables 
in  the  model  are  transformed  by  means  of  the  correlation  transformation  (11.6).  Consider 
the  model  with  two  independent  variables: 

(11.22)  Ff  =  A)  +  jSiXn  +  jS2Xf2  +  8,- 

This  model  in  the  transformed  variables  is  given  by  (11.7)  and  is: 

(11.23)  Y;  =  j8/x;,  +  m'a  +  e( 

The  (X'X)-1  matrix  for  this  transformed  model  is  given  by  (11.11): 

(11.24)  ^(X'Xr^-ij- 

1  -  r\2 

where  r  12  is  the  coefficient  of  correlation  between  X,  and  X2.  Hence,  the  variance- 
covariance  matrix  of  the  estimated  regression  coefficients  is  by  (7.37): 

(11.25)  tr2(b)  -  (o-')2r^  =  (O2- . 

A  rl2 

where  (ex')2  is  the  error  term  variance  for  the  transformed  model  (11.23). 

Thus,  the  estimated  regression  coefficients  b[  and  b2  have  the  same  variance: 

(11.26)  cr2(b[)  =  cx2^)  =  \ 

1  r  12 

which  becomes  larger  as  the  correlation  between  Xx  and  X2  increases.  Indeed,  as  Xx  and 
X2  approach  perfect  correlation  (i.e.,  as  r\2  approaches  1),  the  variances  of  b\  and  b2 
become  larger  without  limit. 

4.  We  have  noted  that  high  multicollinearity  is  usually  not  a  problem  when  the  pur¬ 
pose  of  the  regression  analysis  is  to  make  inferences  on  the  response  function  or  predic¬ 
tions  of  new  observations,  provided  these  inferences  are  made  within  the  range  of  obser¬ 
vations.  In  our  body  fat  example,  for  instance,  the  estimated  mean  body  fat  when  the  only 
independent  variable  included  in  the  model  is  triceps  skinfold  thickness  (Xx),  together 
with  its  estimated  standard  deviation,  are  as  follows  for  Xw  =  25.0  (calculations  not 
shown): 

Yh  =  19.93  s(Yh)  =  .632 

When  the  highly  correlated  independent  variable  thigh  circumference  (X2)  is  also 
included  in  the  model,  the  estimated  mean  body  fat,  together  with  its  estimated  standard 
deviation,  are  as  follows  for  Xhi  —  25.0  and  Xh2  =  50.0: 

Yh  =  19.36  s(Yh)  =  .624 


1  ~r  )2 

~r\2  1 


1  ~r  12 

~rl2  1 


390  /  Multicollinearity,  influential  observations,  and  other  topics — II 

Thus,  the  precision  of  the  estimated  mean  response  is  equally  good  as  before,  despite  the 
addition  of  the  second  independent  variable  which  is  highly  correlated  with  the  first  one. 
This  stability  in  the  precision  of  the  estimated  mean  response  occurred  despite  the  fact  that 
the  estimated  standard  deviation  of  b  \  became  substantially  larger  when  X2  was  added  to 
the  model  (Table  11.3).  The  essential  reason  for  the  stability  is  that  the  covariance  be¬ 
tween  b{  and  b2  is  negative,  and  plays  a  strong  counteracting  influence  to  the  increase  in 
s2{b\)  in  determining  the  value  of  s2{Yh)  as  given  in  (7.68). 

When  all  three  independent  variables  are  included  in  the  model,  the  estimated  mean 
body  fat,  together  with  its  estimated  standard  deviation,  are  as  follows  for  =  25.0, 
Xh2  =  50.0,  and  Xh3  =  29.0: 

Yh=  19.19  s(fh)  =  .  621 

Thus,  the  addition  of  the  third  independent  variable,  which  is  not  highly  correlated  with 
the  first  two  independent  variables,  does  not  materially  affect  the  precision  of  the  esti¬ 
mated  mean  response  either. 

11.3  VARIANCE  INFLATION  FACTORS  AND  OTHER  METHODS 
OF  DETECTING  MULTICOLLINEARITY 

A  variety  of  informal  and  formal  methods  have  been  developed  for  detecting 
the  presence  of  serious  multicollinearity. 

Informal  methods 

Indications  of  the  presence  of  serious  multicollinearity  are  given  by  the  fol¬ 
lowing  diagnostics: 

1.  Large  changes  in  the  estimated  regression  coefficients  when  a  variable  is 
added  or  deleted,  or  when  an  observation  is  altered  or  deleted. 

2.  Nonsignificant  results  in  individual  tests  on  the  regression  coefficients  for 
important  independent  variables. 

3.  Estimated  regression  coefficients  with  an  algebraic  sign  that  is  the  opposite 
of  that  expected  from  theoretical  considerations  or  prior  experience. 

4.  Large  coefficients  of  correlation  between  pairs  of  independent  variables  in 
the  correlation  matrix  rxx. 

5.  Wide  confidence  intervals  for  the  regression  coefficients  representing  im¬ 
portant  independent  variables. 

Example.  In  the  body  fat  example,  the  independent  variables  triceps  skin¬ 
fold  thickness  and  thigh  circumference  are  highly  correlated  with  each  other. 
Also,  we  noted  large  changes  in  the  estimated  regression  coefficients  and  their 
estimated  standard  errors  when  a  variable  was  added,  nonsignificant  results  in 
individual  tests  on  anticipated  important  variables ,  and  an  estimated  negative  coeffi¬ 
cient  when  a  positive  coefficient  was  expected.  Therefore,  serious  multicol¬ 
linearity  among  the  independent  variables  is  suspected. 

Note 

The  informal  methods  just  described  have  important  limitations.  They  do  not  provide 
quantitative  measurements  of  the  impact  of  multicollinearity  nor  may  they  identify  the 


1 1 .3  Variance  inflation  factors  and  other  detection  methods  /  391 


nature  of  the  multicollinearity.  For  instance,  if  independent  variables  X1,X2,  and  Z3  have 
low  pairwise  correlations,  the  correlation  matrix  r xk  will  provide  no  indication  of  the 
presence  of  multicollinearity  even  though  the  three  variables  may  be  closely  related  as  a 
group.  Thus,  examination  of  simple  correlation  coefficients  will  not  necessarily  disclose 
the  existence  of  relations  among  groups  of  independent  variables. 

Another  limitation  of  the  informal  diagnostic  methods  is  that  sometimes  the  observed 
behavior  may  occur  without  multicollinearity  being  present. 


Variance  inflation  factors 

One  formal  method  of  detecting  the  presence  of  multicollinearity  is  by  means 
of  variance  inflation  factors.  These  factors  measure  how  much  the  variances  of 
the  estimated  regression  coefficients  are  inflated  as  compared  to  when  the  inde¬ 
pendent  variables  are  not  linearly  related. 

To  understand  the  significance  of  variance  inflation  factors,  we  begin  with  the 
precision  of  least  squares  estimated  regression  coefficients,  which  is  measured 
by  their  variances.  We  know  from  (7.37)  that  the  variance-covariance  matrix  of 
the  estimated  regression  coefficients  is: 

(11.27)  cr2(b)  =  o-2(X'X)_1 

To  reduce  roundoff  errors  in  calculating  (X'X) _1 ,  we  noted  in  Section  11.1  that 
it  is  desirable  to  first  transform  the  variables  by  means  of  the  correlation  transfor¬ 
mation  (11.6).  In  the  transformed  model,  the  estimated  regression  coefficients  £>( 
are,  as  we  have  seen,  the  standardized  coefficients  defined  in  (7.69).  The  vari¬ 
ance-covariance  matrix  of  the  estimated  standardized  regression  coefficients  is 
according  to  (11.25): 

(11.28)  0"2(b)  =  (cr')2rxx 

where  rXx  is  the  matrix  of  the  pairwise  simple  correlation  coefficients  among 
the  independent  variables,  as  illustrated  in  (11. 10)  for  p  —  1  =  2  independent 
variables,  and  (a-')2  is  the  error  term  variance  for  the  transformed  model. 

Note  from  (11.28)  that  the  variance  of  b’k  (k  =  1, . . .  ,p  —  1)  is  equal  to  the 
product  of  the  error  term  variance  (a-')2  and  the  kth  diagonal  element  of  the 
matrix  rfx.  This  second  factor  is  called  the  variance  inflation  factor  (VIF).  It 
can  be  shown  that  the  variance  inflation  factor  for  b'k,  denoted  by  ( VIF)k ,  is: 

(11.29)  (VIF)k  =  (1  -  Rlr1  k  =  1,2, ...  ,p  -  1 

where  R 2  is  the  coefficient  of  multiple  determination  when  Xk  is  regressed  on  the 
p  —  2  other  X  variables  in  the  model.  Hence,  we  have: 

(11-30)  o-\b'k)  =  (o-')'l(VIF)t  =  -flFI 

i  -  Ri 

We  presented  in  (11.26)  the  special  results  for  or2{b'k)  when  p  —  1  =  2,  for 
which  R%  =  r22,  the  coefficient  of  simple  determination  between  Xx  and  X2. 

The  variance  inflation  factor  ( VIF)k  is  equal  to  1  when  Rf  =  0,  i.e. ,  when  Xk 
is  not  linearly  related  to  the  other  X  variables.  When  Rf^  0,  then  ( VIF)k  is 


392  /  Multicollinearity,  influential  observations,  and  other  topics — II 


greater  than  1 ,  indicating  an  inflated  variance  for  bk .  This  is  evident  from  (11.30) 
since  the  denominator  becomes  smaller  as  Rk  becomes  larger,  leading  to  a  larger 
variance.  When  Xk  has  a  perfect  linear  association  with  the  other  X  variables  in 
the  model  so  that  Rk  =  1,  then  ( VlF)k  and  a2(b'k )  are  unbounded. 

The  largest  ( VIF)k  among  all  X  variables  is  often  used  as  an  indicator  of  the 
severity  of  multicollinearity.  A  maximum  ( VIF)k  in  excess  of  10  is  often  taken  as 
an  indication  that  multicollinearity  may  be  unduly  influencing  the  least  squares 
estimates. 

The  mean  of  the  (VIF)ks  also  provides  information  about  the  severity  of  the 
multicollinearity  in  terms  of  how  far  the  estimated  standardized  regression  coef¬ 
ficients  bk  are  from  the  true  values  j3k.  It  can  be  shown  that  the  expected  value  of 
the  sum  of  these  squared  errors  (b'k  —  (3k)2  is  given  by: 


(11.31) 


E 


2  (w  -  ft')2 


k=  1 


(o22(v/f)4 


k=  1 


Thus,  large  ( VIF)k  values  result,  on  the  average,  in  larger  differences  between 
the  estimated  and  true  standardized  regression  coefficients. 

When  no  X  variable  is  linearly  related  to  the  others  in  the  model,  Rk  =  0; 
hence,  ( VIF)k  =  1  and: 


(11.31a) 


E 


-  ft) 


k=  1 


2 


( o-')2(p  —  1)  when  ( VIF)k  =  1 


A  ratio  of  the  results  in  (11.31)  and  (11.31a)  provides  useful  information  about 
the  effect  of  multicollinearity  on  the  sum  of  the  squared  errors: 

(cr')22(VIF)k  _  1(VIF)k 

(a')2{p  -  1)  p  -  1 

Note  that  this  ratio  is  simply  the  mean  of  the  ( VIF)k  factors,  to  be  denoted  by 
( VIF ): 


sW)* 

(11.32)  ( VIF)  =  — - — 

P  ~  1 

Mean  VIF  values  considerably  larger  than  1  are  indicative  of  serious  multicollin¬ 
earity  problems. 

Example.  Table  11.4  contains  the  estimated  standardized  regression  coeffi¬ 
cients  and  the  ( VIF)k  values  for  our  body  fat  example.  The  maximum  (VIF)k  is 
708.84  and  (VIF)  =  459.26.  Thus,  the  expected  sum  of  the  squared  errors  in  the 
least  squares  regression  coefficients  is  nearly  460  times  as  large  as  it  would  be  if 
the  X  variables  were  uncorrelated.  In  addition,  all  three  ( VIF)k  factors  greatly 
exceed  10,  which  again  indicates  that  serious  multicollinearity  problems  exist. 


1 1 .4  Ridge  regression  and  other  remedial  measures  for  multicollinearity  /  393 


TABLE  11.4  Variance  inflation  factors  for  body  fat  example 


Variable  b'k  ( VIF)k 


Xx  4.2637  708.84 

X2  -1.5614  104.61 

X3  -2.9287  564.34 


Maximum  (VIF)k  =  708.84  (VIF)  =  459.26 


It  is  interesting  to  note  that  (VIF)3  —  564  despite  the  fact  that  both  rj3  and  r\3 
are  small.  Here  is  an  instance  where  X3  is  strongly  related  to  Xx  and  X2  together 
(R3  =  .998),  even  though  the  pairwise  coefficients  of  simple  determination  are 
small.  Examination  of  the  correlation  matrix  rxx  would  not  have  disclosed  this 
multicollinearity. 

Comments 

1.  A  number  of  computer  regression  programs  use  the  reciprocal  of  the  variance 
inflation  factor  to  detect  instances  where  an  X  variable  should  not  be  allowed  into  the 
fitted  regression  model  because  of  excessively  high  interdependence  between  this  variable 
and  the  other  X  variables  in  the  model.  Tolerance  limits  for  1/(V7F)*  =  1  —  R\  frequently 
used  are  .01,  .001,  or  .0001,  below  which  the  variable  is  not  entered  into  the 
model. 

2.  A  limitation  of  variance  inflation  factors  for  detecting  multicollinearities  is  that 
they  cannot  distinguish  between  several  simultaneous  multicollinearities. 

3.  A  number  of  other  methods  for  detecting  multicollinearity  have  been  proposed. 
These  are  more  complex  than  variance  inflation  factors  and  are  discussed  in  specialized 
texts  such  as  Reference  11.1. 


11.4  RIDGE  REGRESSION  AND  OTHER  REMEDIAL  MEASURES 
FOR  MULTICOLLINEARITY 

A  variety  of  remedial  measures  have  been  proposed  for  the  difficulties  caused 
by  multicollinearity.  Some  of  these  leave  intact  the  method  of  least  squares  for 
estimating  the  regression  coefficients  while  others  introduce  modifications  in  the 
method  of  estimation. 


Remedial  measures  with  ordinary  least  squares 

We  consider  first  remedial  measures  that  may  be  employed  with  ordinary  least 
squares. 

1.  As  we  have  seen,  the  presence  of  serious  multicollinearity  often  does  not 
affect  the  usefulness  of  the  fitted  model  for  making  inferences  about  mean  re¬ 
sponses  or  making  predictions,  provided  that  the  values  of  the  independent  varia¬ 
bles  for  which  inferences  are  to  be  made  follow  the  same  multicollinearity  pat¬ 
tern  as  the  data  on  which  the  regression  model  is  based.  Hence,  one  remedial 


394  /  Multicollinearity,  influential  observations,  and  other  topics — II 


measure  is  to  restrict  the  use  of  the  fitted  regression  model  to  inferences  for 
values  of  the  independent  variables  which  follow  the  same  pattern  of  multicollin¬ 
earity. 

2.  In  polynomial  regression  models,  as  we  noted  in  Chapter  9,  expressing  the 
independent  variable(s)  in  the  form  of  deviations  from  the  mean  serves  to  reduce 
substantially  the  multicollinearity  among  the  first-order,  second-order,  and 
higher-order  terms  for  any  given  independent  variable. 

3.  One  or  several  independent  variables  may  be  dropped  from  the  model  in 
order  to  lessen  the  multicollinearity  and  thereby  reduce  the  standard  errors  of  the 
estimated  regression  coefficients  of  the  independent  variables  remaining  in  the 
model.  This  remedial  measure  has  two  important  limitations.  First,  no  direct 
information  is  obtained  about  the  dropped  independent  variables.  Second,  the 
magnitudes  of  the  regression  coefficients  for  the  independent  variables  remaining 
in  the  model  are  affected  by  the  correlated  independent  variables  not  included  in 
the  model. 

4.  Sometimes  it  is  possible  to  add  some  observations  which  break  the  pattern 
of  multicollinearity.  Often,  however,  this  option  is  not  available.  In  business  and 
economics,  for  instance,  many  independent  variables  cannot  be  controlled,  so 
that  new  observations  will  tend  to  show  the  same  intercorrelation  patterns  as  the 
earlier  observations. 

5.  In  some  economic  studies,  it  is  possible  to  estimate  the  regression  coeffi¬ 
cients  for  different  independent  variables  from  different  sets  of  data  to  avoid  the 
problems  of  multicollinearity.  Demand  studies,  for  instance,  may  use  both 
cross-section  and  time  series  data  to  this  end.  Suppose  the  independent  variables 
in  a  demand  study  are  price  and  income,  and  the  relation  to  be  estimated  is: 

(11.33)  Y,  =  fa  +  faXn  +  hXn  +  * 

where  Y  is  demand,  Xx  is  income,  andX2  is  price.  The  income  coefficient  /3,  may 
then  be  estimated  from  cross-section  data.  The  demand  variable  Y  is  thereupon 
adjusted: 

(11-34)  Y\=Yi-b1Xil 

Finally,  the  price  coefficient  jS2  is  estimated  by  regressing  the  adjusted  demand 
variable  Y'  onI2. 


Ridge  regression 

Biased  estimation.  Ridge  regression  is  one  of  several  methods  that  have 
been  proposed  to  remedy  multicollinearity  problems  by  modifying  the  method  of 
least  squares  to  allow  biased  estimators  of  the  regression  coefficients.  When  an 
estimator  has  only  a  small  bias  and  is  substantially  more  precise  than  an  unbiased 
estimator,  it  may  well  be  the  preferred  estimator  since  it  will  have  a  larger 
probability  of  being  close  to  the  true  parameter  value.  Figure  11.1  illustrates  this 
situation.  Estimator  b  is  unbiased  but  imprecise,  while  estimator  bR  is  much 


1 1 .4  Ridge  regression  and  other  remedial  measures  for  multicollinearity  /  395 


FIGURE  11.1  Biased  estimator  with  small  variance  may  be  preferable  to 
unbiased  estimator  with  large  variance 


0  Parameter 

u 


Bias  of  bR 


more  precise  but  has  a  small  bias.  The  probability  that  bR  falls  near  the  true  value 
(3  is  much  greater  than  that  for  the  unbiased  estimator  b. 

A  measure  of  the  combined  effect  of  bias  and  sampling  variation  is  the  ex¬ 
pected  value  of  the  squared  deviation  of  the  biased  estimator  bR  from  the  true 
parameter  /3.  This  measure  is  called  the  mean  squared  error,  and  it  can  be  shown 
to  equal: 

(11.35)  E(bR  -  /3)2  =  a2{bR)  +  [E(bR)  -  /3]2 

Thus,  the  mean  squared  error  equals  the  variance  of  the  estimator  plus  the 
squared  bias.  Note  that  if  the  estimator  is  unbiased,  the  mean  squared  error  is 
identical  to  the  variance  of  the  estimator. 

Ridge  estimators.  For  ordinary  least  squares,  the  normal  equations  are 
given  by  (7.20): 

(11.36)  (X'X)b  =  X'Y 

When  all  variables  are  transformed  by  the  correlation  transformation  (11.6),  the 
transformed  regression  model  is  given  by: 

(11.37)  Y\  =  (3(X'n  +  feX'a  +  •••  +  &-ixlP-i  +  e{ 
and  the  least  squares  normal  equations  become: 

(11.38)  rxxb  =  rYX 

where  rxx  is  a  (p  —  1)  x  (/?  —  1)  matrix  containing  the  pairwise  coefficients  of 
simple  correlation  between  the  independent  variables: 

1  rl2  rhp-i 

r2  i  1  •••  r2,p- i 


(11.39a)  r^x  ~ 

(p-l)x(p-l) 


rp- 1,2 


1 


396  /  Multicollinearity,  influential  observations,  and  other  topics — II 


and  ryx  is  a  (/?  —  1)  X  1  vector  containing  the  coefficients  of  simple  correlation 
between  the  dependent  variable  and  each  of  the  independent  variables: 


rY\ 
rY  2 

(11.39b)  rYX  =  ' 

(p-l)xl 

Y,p-  1 

Formulas  (11.7),  (11.10),  and  (11.12)  are  illustrative  of  the  case  p  —  1  =  2 
independent  variables. 

The  ridge  standardized  regression  estimators  are  obtained  by  introducing  into 
the  least  squares  normal  equations  (11.38)  a  biasing  constant  c  >  0,  in  the  fol¬ 
lowing  form: 

(11.40)  (rxx  +  cl)b*  =  rYX 

where  bA  is  the  vector  of  the  standardized  ridge  regression  coefficients  bf.\ 


(11.41) 


bR 


b 

b 


R  1 

1 

R 

2 


b 


R 

P~  1. 


and  I  is  the  (p  —  1)  X  (p  —  1)  identity  matrix.  Solution  of  the  normal  equations 
(11.40)  yields  the  ridge  standardized  regression  coefficients: 

(11.42)  bR  ^  (rxx  +  ciylrYX 


The  constant  c  reflects  the  amount  of  bias  in  the  estimators.  When  c  =  0, 
(11.42)  reduces  to  the  ordinary  least  squares  regression  coefficients  in  standard¬ 
ized  form,  as  given  in  (7.69).  When  c  >  0,  the  ridge  regression  coefficients  are 
biased  but  tend  to  be  more  stable  (i.e.,  less  variable)  than  ordinary  least  squares 
estimators. 


Choice  of  biasing  constant  c.  It  can  be  shown  that  the  bias  component  of 
the  total  mean  squared  error  of  the  ridge  regression  estimator  bA  increases  as  c 
gets  larger  (with  all  bR  tending  toward  zero),  while  at  the  same  time  the  variance 
component  becomes  smaller.  It  can  further  be  shown  that  there  always  exists 
some  value  c  for  which  the  ridge  regression  estimator  bR  has  a  smaller  total  mean 
squared  error  than  the  ordinary  least  squares  estimator  b.  The  difficulty  is  that  the 
optimum  value  of  c  varies  from  one  application  to  another  and  is  unknown. 

A  commonly  used  method  of  determining  the  biasing  constant  c  is  based  on 
the  ridge  trace  and  the  variance  inflation  factors  ( VIF)k  in  (11.29).  The  ridge 
trace  is  a  simultaneous  plot  of  the  values  of  the  p  —  1  estimated  ridge  standard¬ 
ized  regression  coefficients  for  different  values  of  c,  usually  between  0  and  1. 


1 1 .4  Ridge  regression  and  other  remedial  measures  for  multicollinearity  /  397 


Extensive  experience  has  indicated  that  an  estimated  regression  coefficient 
may  fluctuate  widely  as  c  is  changed  slightly  from  0,  and  may  even  change  signs. 
Gradually,  however,  these  wide  fluctuations  cease  and  the  magnitude  of  the 
regression  coefficient  tends  to  change  only  slowly  as  c  is  increased  further.  At 
the  same  time,  the  value  of  ( VlF)k  tends  to  fall  rapidly  as  c  is  changed  from  0, 
and  gradually  {VIF)k  also  tends  to  change  only  moderately  as  c  is  increased 
further.  One  therefore  examines  the  ridge  trace  and  the  variance  inflation  factors 
and  chooses  the  smallest  value  of  c  where  it  is  deemed  that  the  regression  coeffi¬ 
cients  first  become  stable  in  the  ridge  trace  and  the  variance  inflation  factors  have 
become  sufficiently  small.  The  choice  is  thus  a  judgmental  one. 


Example.  We  noted  previously  the  severe  multicollinearity  in  the  data  for 
our  body  fat  example.  Indeed,  in  the  model  with  three  independent  variables 
(Table  11.3c,  p.  387),  the  estimated  regression  coefficient  b2  is  negative  even 
though  it  was  expected  that  amount  of  body  fat  is  positively  related  to  thigh 
circumference.  Ridge  regression  calculations  were  made  for  the  body  fat  exam¬ 
ple  data  in  Table  11.2  (calculations  not  shown).  The  ridge  standardized  regres¬ 
sion  coefficients  for  selected  values  of  c  are  presented  in  Table  11.5,  and  the 

TABLE  11.5  Ridge  estimated  standardized  regression  coefficients 
for  different  biasing  constants  c —  body  fat  example 


c 

bR, 

bRs 

.000 

4.264 

-2.929 

-1.561 

.001 

2.035 

-.9408 

-.7087 

.002 

1.441 

-.4113 

-.4813 

.003 

1.165 

-.1661 

-.3758 

.004 

1.006 

-.0248 

-.3149 

.005 

.9028 

.0670 

-.2751 

.006 

.8300 

.1314 

-.2472 

.007 

.7760 

.1791 

-.2264 

.008 

.7343 

.2158 

-.2103 

.009 

.7012 

.2448 

-.1975 

.010 

.6742 

.2684 

-.1870 

.020 

.5463 

.3774 

-.1369 

.030 

.5004 

.4134 

-.1181 

.040 

.4760 

.4302 

-.1076 

.050 

.4605 

.4392 

-.1005 

.060 

.4494 

.4443 

-.0952 

.070 

.4409 

.4472 

-.0909 

.080 

.4341 

.4486 

-.0873 

.090 

.4283 

.4491 

-.0841 

.100 

.4234 

.4490 

-.0812 

.200 

.3914 

.4347 

-.0613 

.300 

.3703 

.4154 

-.0479 

.400 

.3529 

.3966 

-.0376 

.500 

.3377 

.3791 

-.0295 

.600 

.3240 

.3629 

-.0229 

.700 

.3116 

.3481 

-.0174 

.800 

.3002 

.3344 

-.0129 

.900 

.2896 

.3218 

-.0091 

1.000 

.2798 

.3101 

-.0059 

398  /  Multicollinearity,  influential  observations,  and  other  topics — II 


TABLE  11.6  WF  values  for  regression  coefficients  and  R2  for 
different  biasing  constants  c —  body  fat  example 


c 

(VIF), 

(' VIFh 

(V/F)3 

R2 

.000 

708.84 

564.34 

104.61 

.8014 

.001 

125.73 

100.27 

19.28 

.7888 

.002 

50.56 

40.45 

8.28 

.7852 

.003 

27.18 

21.84 

4.86 

.7832 

.004 

16.98 

13.73 

3.36 

.7819 

.005 

11.64 

9.48 

2.58 

.7809 

.006 

8.50 

6.98 

2.19 

.7801 

.007 

6.50 

5.38 

1.82 

.11 9  A 

.008 

5.15 

4.30 

1.62 

.7787 

.009 

4.19 

3.54 

1.48 

.7781 

.010 

3.49 

2.98 

1.38 

.7775 

.020 

1.10 

1.08 

1.01 

.7726 

.030 

.63 

.70 

.92 

.7682 

.040 

.45 

.56 

.88 

.7639 

.050 

.37 

.49 

.85 

.7597 

.060 

.32 

.45 

.83 

.7556 

.070 

.30 

.42 

.81 

.7515 

.080 

.28 

.40 

.79 

.7475 

.090 

.26 

.39 

.78 

.7436 

.100 

.25 

.37 

.76 

.7397 

.200 

.21 

.31 

.63 

.7031 

.300 

.18 

.27 

.54 

.6702 

.400 

.17 

.24 

.46 

.6405 

.500 

.15 

.21 

.40 

.6134 

.600 

.14 

.19 

.35 

.5887 

.700 

.13 

.18 

.31 

.5659 

.800 

.12 

.16 

.28 

.5449 

.900 

.11 

.15 

.25 

.5254 

1.000 

.11 

.14 

.23 

.5073 

variance  inflation  factors  are  given  in  Table  11.6.  The  coefficients  of  multiple 
determination  R 2  are  also  shown  in  the  latter  table.  Figure  11.2  presents  the  ridge 
trace  of  the  estimated  standardized  regression  coefficients.  To  facilitate  the  anal¬ 
ysis,  the  horizontal  c  scale  in  Figure  11.2  is  logarithmic. 

Note  the  instability  in  Figure  1 1 .2  of  the  regression  coefficients  for  very  small 
values  of  c.  The  estimated  regression  coefficient  b§,  in  fact,  changes  signs.  Also 
note  the  rapid  decrease  in  the  variance  inflation  factors  in  Table  11.6.  It  was 
decided  to  employ  c  =  .02  here  because  for  this  value  of  the  biasing  constant  the 
ridge  regression  coefficients  have  VIF  values  near  1  and  the  estimated  regression 
coefficients  appear  to  have  become  reasonably  stable.  The  resulting  model  for 
c  =  .02  is: 


Y[  =  .5463X1  +  .3774X2  -  .1369X3 

Transforming  back  to  the  original  variables  by  (11.8),  as  extended  to  three  inde¬ 
pendent  variables,  we  obtain: 


Yi  =  -7.3978  +  .5553Xi  +  .3681X2  -  .1917X3 


1 1 .4  Ridge  regression  and  other  remedial  measures  for  multicollinearity  /  399 


FIGURE  11.2  Ridge  trace  of  estimated  standardized  regression  coefficients — body  fat 
example 


where  Y  =  20.195,  =  25.305,  X2  =  51.170,  Z3  =  27.620,  sY  =  5.106, 

Si  =  5.023,  s2  —  5.235,  and  s3  =  3.647. 

The  improper  sign  on  the  estimate  for  (32  has  now  been  eliminated,  and  the 
estimated  regression  coefficients  are  more  in  line  with  prior  expectations.  The 
sum  of  the  squared  residuals  for  the  transformed  variables,  which  increases  with 
c,  has  only  increased  from  .1986  at  c  =  0  to  .2274  at  c  =  .02  while/?2  decreased 
from  .8014  to  .7726.  These  changes  are  relatively  modest.  The  estimated  mean 
body  fat  when  Xhx  =  25.0,  Xh2  =  50.0,  and  Xh3  =  29.0  is  19.33  for  the  ridge 
regression  at  c  =  .02  compared  to  19. 19  utilizing  the  ordinary  least  squares  solu¬ 
tion.  Thus,  the  ridge  solution  at  c  =  .02  appears  to  be  quite  satisfactory  here  and 
a  reasonable  alternative  to  the  ordinary  least  squares  solution. 

Comments 

1.  The  normal  equations  (11.40)  for  the  ridge  estimators  are  as  follows: 

(1  +  c)fc?  +  r12^2  +  ••  •  +  rhp- =  rY\ 

r2  ibi  +  (1  +  c)b2  +  •  •  •  +  r2,p-\bp-i  =  rY2 

(11.43) 

rp- u^i  +  rp-h2b2  +  •  •  •  +  (1  +  c)b$-i  =  rY,p-x 


400  /  Multicol linearity,  influential  observations,  and  other  topics — II 


where  rtj  is  the  coefficient  of  simple  correlation  between  the  /th  and  y'th  independent 
variables  and  rYj  is  the  coefficient  of  simple  correlation  between  the  dependent  variable  Y 
and  the  fth  independent  variable. 

2.  Ridge  regression  estimates  tend  to  be  stable  in  the  sense  that  they  are  usually  little 
affected  by  small  changes  in  the  data  on  which  the  fitted  regression  is  based.  In  contrast, 
ordinary  least  squares  estimates  may  be  highly  unstable  under  these  conditions  when  the 
independent  variables  are  highly  multicollinear.  Also,  the  ridge  estimated  regression  func¬ 
tion  at  times  will  provide  good  estimates  of  mean  responses  or  predictions  of  new  observa¬ 
tions  for  levels  of  the  independent  variables  outside  the  region  of  the  observations  on 
which  the  regression  function  is  based.  In  contrast,  the  estimated  regression  function 
based  on  ordinary  least  squares  may  perform  quite  poorly  in  such  instances.  Of  course, 
any  estimation  or  prediction  well  outside  the  region  of  the  observations  should  always  be 
made  with  great  caution. 

3.  A  major  limitation  of  ridge  regression  is  that  ordinary  inference  procedures  are  not 
applicable  and  exact  distributional  properties  are  not  known.  Another  limitation  is  that  the 
choice  of  the  biasing  constant  c  is  a  judgmental  one.  While  formal  methods  have  been 
developed  for  making  this  choice,  these  methods  have  their  own  limitations. 

4.  The  ridge  regression  procedures  have  been  generalized  to  allow  for  differing 
biasing  constants  for  the  different  estimated  regression  coefficients. 

Other  remedial  measures 

Still  other  approaches  to  remedying  the  problems  of  multicollinearity  have 
been  developed.  These  include  regression  with  principal  components,  where  the 
independent  variables  are  linear  combinations  of  the  original  independent  varia¬ 
bles,  and  Bayesian  regression,  where  prior  information  about  the  regression  co¬ 
efficients  is  incorporated  into  the  estimation  procedure.  More  information  about 
these  approaches,  as  well  as  about  ridge  regression  and  generalized  ridge  regres¬ 
sion,  may  be  obtained  from  specialized  works  such  as  Reference  11.1. 

11.5  IDENTIFICATION  OF  OUTLYING  OBSERVATIONS 

Frequently  in  regression  analysis  applications,  the  data  set  contains  some  ob¬ 
servations  which  are  outlying  or  extreme,  i.e. ,  observations  which  are  well  sepa¬ 
rated  from  the  remainder  of  the  data.  These  outlying  observations  may  involve 
large  residuals  and  often  have  dramatic  effects  on  the  fitted  least  squares  regres¬ 
sion  function.  It  is  therefore  important  to  study  the  outlying  observations  care¬ 
fully  and  decide  whether  they  should  be  retained  or  eliminated,  and  if  retained, 
whether  their  influence  should  be  reduced  in  the  fitting  process  and/or  the  regres¬ 
sion  model  revised. 

An  observation  may  be  outlying  or  extreme  with  respect  to  its  Y  value,  its  X 
value(s),  or  both.  Figure  11.3  illustrates  this  for  the  case  of  regression  with  a 
single  independent  variable.  In  the  scatter  plot  in  Figure  11.3,  observation  1  is 
outlying  with  respect  to  its  Y  value.  Note  that  this  point  falls  far  outside  the 
scatter,  although  its  X  value  is  near  the  middle  of  the  range  of  the  observations  on 
the  independent  variable.  Observations  2  and  3  are  outlying  with  respect  to  their 
X  values  since  they  have  much  larger  X  values  than  those  for  the  other  observa¬ 
tions;  observation  3  is  also  outlying  with  respect  to  its  Y  value. 


1 1 .5  Identification  of  outlying  observations  /  401 


FIGURE  11.3  Scatter  plot  for  regression  with  one  independent  variable 
illustrating  outlying  observations 


Not  all  outlying  observations  have  a  strong  influence  on  the  fitted  regression 
function.  Observation  1  in  Figure  11.3  may  not  be  too  influential  because  there 
are  a  number  of  other  observations  that  have  similar  X  values,  which  will  keep 
the  fitted  regression  function  from  being  displaced  too  far  by  the  outlying  obser¬ 
vation.  Likewise,  observation  2  may  not  be  too  influential  because  its  Y  value  is 
consistent  with  the  regression  relation  displayed  by  the  nonextreme  observations. 
Observation  3,  on  the  other  hand,  is  likely  to  be  very  influential  in  affecting  the 
fit  of  the  regression  function  because  it  is  outlying  with  regard  to  its  X  value,  and 
its  Y  value  is  not  consistent  with  the  regression  relation  for  the  other  observa¬ 
tions. 

In  the  case  of  regression  with  one  or  two  independent  variables,  it  is  relatively 
simple  to  identify  outlying  observations  with  respect  to  their  X  or  Y  values  and  to 
study  whether  or  not  they  are  influential  in  affecting  the  fitted  regression 
function.  When  more  than  two  independent  variables  are  included  in  the  regres¬ 
sion  model,  however,  the  identification  of  outlying  observations  by  simple 
graphic  means,  such  as  scatter  and  residual  plots,  becomes  difficult  and  more 
powerful  methods  are  required.  We  now  discuss  some  methods  for  identifying 
observations  that  are  outlying  with  respect  to  their  X  or  F  values. 

Use  of  hat  matrix  for  identifying  outlying  X  observations 

We  encountered  the  hat  matrix  H  in  Chapter  6  where  we  noted  in  (6.93)  that 
the  least  squares  residuals  can  be  expressed  as  a  linear  combination  of  the  obser¬ 
vations  Yt  by  means  of  the  hat  matrix: 

(11.44)  e  =  (I  —  H)Y 
The  hat  matrix  H  is  given  by  (6.93a): 

(11.45)  H  -  X(X'X)_1X' 

nXn 

Similarly,  the  fitted  values  Y)  can  be  expressed  as  linear  combinations  of  the 
observations  Yt  through  the  hat  matrix: 

(11.46)  Y  =  HY 


402  /  Multicollinearity,  influential  observations,  and  other  topics — II 


Further,  we  noted  in  (6.95)  that  the  variances  and  covariances  of  the  residuals 
involve  the  hat  matrix: 

(11.47)  a2(e)  -  o-2(I  -  H) 

so  that  the  variance  of  residual  et,  denoted  by  cr2(e;),  is: 

(11.48)  <r2(ei)  =  <r2(  1  -  hu) 

where  hti  is  the  z'th  element  on  the  main  diagonal  of  the  hat  matrix. 

The  diagonal  element  ha  of  the  hat  matrix  can  be  obtained  directly  from: 

(11.49)  hu  =  X'iX'Xy'X; 


where  X;  corresponds  to  the  Xh  vector  in  (7.45)  except  that  X*  pertains  to  the  z’th 
sample  observation: 


(11.49a) 


Note  that  X-  is  simply  the  zth  row  of  the  X  matrix,  pertaining  to  the  z'th  sample 
observation. 

The  diagonal  elements  hu  have  some  useful  properties: 


(11.50) 


0  <  hu  <  1 


n 


'Zhn  =  p 
2=1 


where  p  is  the  number  of  regression  parameters  in  the  regression  function  includ¬ 
ing  the  intercept  term. 

The  diagonal  element  hu  in  the  hat  matrix  is  called  the  leverage  (in  terms  of 
the  X  values)  of  the  z'th  observation.  It  indicates  whether  or  not  the  X  values  for 
the  z'th  observation  are  outlying,  because  it  can  be  shown  that  hu  is  a  measure  of 
the  distance  between  the  X  values  for  the  z'th  observation  and  the  means  of  the  X 
values  for  all  n  observations.  Thus,  a  large  leverage  value  hu  indicates  that  the  z'th 
observation  is  distant  from  the  center  of  the  X  observations.  Figure  11.4  illus¬ 
trates  this  for  the  case  of  two  independent  variables.  Observation  1  is  distant 
from  the  center  (Xj ,  X2)  and  has  a  large  leverage  value  hu  =  .812  while  obser¬ 
vation  2  is  near  the  center  and  has  a  small  leverage  value  h22  =  .253. 

If  the  z'th  observation  is  an  outlying  X  observation — i.e.,  one  with  a  large 
leverage  value  hu — it  exercises  substantial  leverage  in  determining  the  fitted 
value  Yi.  This  is  so  for  the  following  reasons: 

1.  The  fitted  value  Yt  is  a  linear  combination  of  the  observed  F  values,  as 
shown  in  (1 1 .46),  and  hu  is  the  weight  of  observation  Yt  in  determining  this 
fitted  value.  Thus,  the  larger  is  hu,  the  more  important  is  Y,  in  determining 


1 1 .5  Identification  of  outlying  observations  /  403 


FIGURE  11.4  Illustration  of  observations  with  X  values  near  and  far  from 
center 


7,,  Remember  that  hu  is  a  function  only  of  the  X  values,  so  hu  measures  the 
role  of  the  X  values  in  determining  how  important  Yl  is  in  affecting  the  fitted 
value  Yi. 

2.  The  larger  is  hu,  the  smaller  is  the  variance  of  the  residual  eu  as  may  be  seen 
from  (1 1 .48).  Hence,  the  larger  is  hu,  the  closer  the  fitted  value  Y,:  will  tend 
to  be  to  the  observed  value  7,.  In  the  extreme  case  where  hu  =  1 ,  cr2(e;)  =  0 
so  that  the  fitted  value  7,  is  then  forced  to  equal  the  observed  value  Yh  Since 
observations  with  high  leverage  tend  to  have  smaller  residuals,  it  may  not  be 
possible  to  detect  them  by  an  examination  of  the  residuals  alone. 


A  leverage  value  hu  is  usually  considered  to  be  large  if  it  is  more  than  twice  as 
large  as  the  mean  leverage  value,  denoted  by  h,  which  according  to  (11.50)  is: 


(11.51) 


n 


n  n 


Hence,  leverage  values  greater  than  2 pin  are  considered  by  this  rule  to  indicate 
outlying  observations  with  regard  to  the  X  values.  Additional  evidence  of  an 
extreme  observation  is  the  existence  of  a  gap  between  the  leverage  values  for 
most  of  the  observations  and  the  unusually  large  leverage  value(s). 


Example.  We  continue  with  the  body  fat  example,  using  only  the  two  inde¬ 
pendent  variables  triceps  skinfold  thickness  (Zx)  and  thigh  circumference  (Z2)  so 
that  the  results  using  the  hat  matrix  can  be  compared  to  simple  graphic  plots.  The 
data  for  this  example  were  presented  earlier  in  Table  8.3  (p.  275).  Figure  11.5 
contains  a  scatter  plot  of  Z2  against  Zx ,  where  the  observations  are  shown  by 
their  observation  number.  We  note  from  Figure  11.5  that  observations  15  and  3 
appear  to  be  outlying  ones  with  respect  to  the  pattern  of  the  X  values.  Observa- 


404  /  Multicollinearity,  influential  observations,  and  other  topics — II 


FIGURE  11.5  Scatter  plot  of  thigh  circumference  against  triceps  skinfold  thickness — 
body  fat  example 


Thigh  Circumference 


tion  15  is  outlying  for  Xx  and  at  the  low  end  of  the  range  for  X2,  while  observa¬ 
tion  3  is  outlying  in  terms  of  the  pattern  of  multicollinearity,  though  it  is  not 
outlying  for  either  of  the  independent  variables  separately.  Observations  1  and  5 
also  appear  to  be  somewhat  extreme. 

Calculation  of  the  hat  matrix  (11.45)  confirms  these  impressions.  Table  11.7, 
column  2,  contains  the  leverage  values  ha  for  the  body  fat  example.  Note  that  the 
two  largest  leverage  values  are  ft33  =  .372  and  /i15;15  =  .333.  Both  exceed  the 
criterion  of  twice  the  mean  leverage  value,  2 pin  =  2(3)/20  =  .30,  and  are  sepa¬ 
rated  by  a  substantial  gap  from  the  next  largest  leverage  values  h55  =  .248  and 
h ii  =  .201.  Having  identified  observations  3  and  15  as  outlying  observations  in 
terms  of  their  X  values,  we  shall  need  to  ascertain  how  influential  these  observa¬ 
tions  are  in  the  fitting  of  the  regression  function.  We  consider  this  question  after 
taking  up  the  identification  of  outlying  Y  observations. 


Use  of  studentized  deleted  residuals  for  identifying 
outlying  Y  observations 

The  detection  of  outlying  or  extreme  Y  observations  based  on  an  examination 
of  the  residuals  has  been  considered  in  earlier  chapters.  We  utilized  there  either 
the  residuals  eg 

(11.52) 

or  the  standardized  residuals: 


(11.53) 


gj 

Vmse 


1 1 .5  Identification  of  outlying  observations  /  405 


TABLE  11.7  Residuals,  diagonal  elements  of  the  hat  matrix,  studentized 
deleted  residuals,  and  Cook’s  distances — body  fat  example 


i 

(1) 

(2) 

hu 

(3) 

df 

(4) 

D, 

i 

-1.683 

.201 

-.730 

.046 

2 

3.643 

.059 

1.534 

.046 

3 

-3.176 

.372 

-1.656 

.490 

4 

-3.158 

.111 

-1.348 

.072 

5 

.000 

.248 

.000 

.000 

6 

-.361 

.129 

-.148 

.001 

7 

.716 

.156 

.298 

.006 

8 

4.015 

.096 

1.760 

.098 

9 

2.655 

.115 

1.117 

.053 

10 

-2.475 

.110 

-1.034 

.044 

11 

.336 

.120 

.137 

.001 

12 

2.226 

.109 

.923 

.035 

13 

-3.947 

.178 

-1.825 

.211 

14 

3.447 

.148 

1.524 

.125 

15 

.571 

.333 

.267 

.013 

16 

.642 

.095 

.258 

.002 

17 

-.851 

.106 

.344 

.005 

18 

-.783 

.197 

.335 

.010 

19 

-2.857 

.067 

-1.176 

.032 

20 

1.040 

.050 

.409 

.003 

We  introduce  now  two  refinements  to  make  the  analysis  of  residuals  more  effec¬ 
tive  for  identifying  outlying  Y  observations. 

When  the  residuals  et  have  substantially  different  variances  cr2(e;),  as  given 
in  (11.48),  it  is  better  to  consider  the  magnitude  of  e{  relative  to  o-(e,)  instead  of 
VMSE  to  give  recognition  to  differences  in  their  sampling  errors.  Since  by 
(1 1.48)  we  have: 

<r2(ei )  =  o-2(l  -  hti) 
an  unbiased  estimator  of  this  variance  is: 

(11.54)  s2(e,)  =  MSE(l  -  hu) 

The  ratio  of  et  to  s(et)  is  called  the  studentized  residual  and  will  be  denoted  by 

eh 

(11.55)  ef  = 

s(ed 

Note  that  the  residuals  et  will  have  substantially  different  sampling  variations  if 
the  leverage  values  hu  differ  markedly,  but  the  studentized  residuals  have  con¬ 
stant  variance  (when  the  model  is  appropriate). 

The  second  refinement  is  to  measure  the  z'th  residual  et  =  Yt  —  Yt  when  the 
fitted  regression  is  based  on  the  observations  excluding  the  z'th  one.  In  this  way, 
the  fitted  value  Yt  cannot  be  influenced  by  the  z'th  observation  to  be  close  to  Yt 
because  this  observation  is  not  part  of  the  data  set  on  which  the  fitted  value  is 
based.  Thus,  the  z'th  observation  is  deleted,  the  regression  function  is  fitted  to  the 


406  /  Multicollinearity,  influential  observations,  and  other  topics — II 


remaining  n  —  1  observations,  and  the  point  estimate  of  the  expected  value  when 
the  X  levels  are  those  of  the  ith  observation,  to  be  denoted  by  Y(r),  will  be 
compared  with  the  actual  Yt  observed  value.  The  notation  Y(i)  reminds  us  that  the 
ith  observation  was  omitted  when  fitting  the  regression  function.  The  residual: 

(11.56)  di  =  Yi-Y(i) 

is  called  a  deleted  residual  and  is  denoted  by  dt. 

Note  that  a  deleted  residual  corresponds  to  the  prediction  error  in  the  numera¬ 
tor  of  (3.34)  when  predicting  a  new  observation  from  the  fitted  regression  func¬ 
tion  based  on  earlier  observations,  except  that  in  (3.34)  the  difference  considered 
is  YU)  —  Yt  and  the  notation  differs  from  the  present  one.  Hence,  we  know  from 
(7.55a)  that  the  estimated  variance  of  dt  is: 

(11.57)  sV,-)  =  MSEln(  1  +  XftXfoX^r'X,-) 

where  X,  is  theX  observations  vector  (11.49a)  for  the  ith  observation,  MSE(i)  is 
the  mean  square  error  when  the  ith  observation  is  omitted  in  fitting  the  regression 
function,  and  X(;)  is  the  X  matrix  with  the  ith  observation  deleted.  Also,  it 
follows  from  (7.55)  that: 


(11.58)  -^-~t(n-p-  1) 

s(di) 

Remember  that  n  —  1  observations  are  used  here  in  predicting  the  ith  observa¬ 
tion;  hence,  the  degrees  of  freedom  are  (w  —  1)  —  p  =  n  —  p  —  1. 

Combining  the  two  refinements,  we  shall  use  for  diagnosis  of  outlying  or 
extreme  F  observations  the  deleted  residual  dt  in  (11.56)  and  studentize  it  by 
dividing  it  by  its  estimated  standard  deviation  given  by  (11.57).  The  studentized 
deleted  residual,  to  be  denoted  by  df ,  therefore  is: 


(11.59) 


df 


dt 


s(di) 


We  know  from  (11.58)  that  each  studentized  deleted  residual  df  follows  the  t 
distribution  with  n  —  p  —  1  degrees  of  freedom.  The  df ,  however,  are  not  inde¬ 
pendent. 

Fortunately,  the  studentized  deleted  residuals  df  in  (11.59)  can  be  calculated 
without  having  to  fit  regression  functions  with  the  ith  observation  omitted.  It  can 
be  shown  that  an  algebraically  equivalent  expression  for  df  is: 


(11.59a) 


n  —  p  —  1 


1/2 


SSE(  1  -  hu)  -  ef 


Thus,  the  studentized  deleted  residual  df  can  be  calculated  from  the  residual  eh 
the  error  sum  of  squares  SSE,  and  the  leverage  value  hn,  all  for  the  fitted  regres¬ 
sion  based  on  the  n  observations. 

To  identify  outlying  F  observations,  we  examine  the  studentized  deleted  resid¬ 
uals  for  large  absolute  values  and  use  the  appropriate  t  distribution  to  ascertain 
how  far  in  the  tails  such  outlying  values  fall. 


1 1 .6  Identification  of  influential  observations  and  remedial  measures  /  407 


Example.  We  illustrate  the  calculation  of  studentized  deleted  residuals  for 
the  first  observation  in  the  body  fat  example.  The  X  values  for  this  observation 
are  Xn  =  19.5  and  X12  =  43.1.  Using  the  fitted  regression  function  from  Table 
8.4a,  we  obtain: 

=  -19.174  +  .2224(19.5)  +  .6594(43.1)  =  13.583 


Since  Yx  =  11.9,  the  residual  for  this  observation  is  ex  =  11.9  —  13.583  = 
— 1.683.  We  also  know  from  Table  8.4a  that  SSE  =  109.95  and  from  Table  1 1 .7 
that  hn  =  .201.  Hence,  by  (11.59a),  we  find: 


df  =  -1.683 


109.95(1 


20  -  3  - 

-  .201)  - 


1 _ 

(-1.683)2 


1/2 


-.730 


The  studentized  deleted  residuals  for  all  20  observations  are  shown  in  column  3 
of  Table  11.7. 

Note  that  observations  3,  8,  and  13  have  the  largest  absolute  studentized 
deleted  residuals.  If  we  consider  tail  areas  of  .05  on  each  side  to  be  extreme,  we 
will  need  to  compare  the  absolute  values  of  the  studentized  deleted  residuals  with 
t(. 95;  16)  =  1.746.  Based  on  this  comparison,  we  should  consider  observations 
8  and  13  extreme  enough  to  warrant  studying  whether  or  not  they  are  influential 
observations.  Incidentally,  consideration  of  the  residuals  et  (shown  in  Table 
11.7,  column  1)  here  would  also  have  identified  observations  8  and  13  as  the 
most  outlying  ones. 


11.6  IDENTIFICATION  OF  INFLUENTIAL  OBSERVATIONS 
AND  REMEDIAL  MEASURES 

After  identifying  outlying  observations  with  respect  to  their  X  values  and/or 
their  Y  values,  the  next  step  is  to  ascertain  whether  or  not  they  are  influential  in 
affecting  the  fit  of  the  regression  function,  possibly  leading  to  serious  distortion 
effects. 

One  measure  of  the  influence  of  the  z'th  observation  on  the  fit  of  the  regression 
function  is  the  difference  between  the  vector  b  of  the  estimated  regression  coeffi¬ 
cients  based  on  all  n  observations  and  the  vector  b(,j  based  on  the  n  —  1  observa¬ 
tions  with  the  z'th  observation  deleted: 

(11.60)  b-b(0 

Another  possible  measure  of  the  influence  of  the  z'th  observation  is  the  difference 
between  the  fitted  value  Yt  based  on  the  regression  with  all  n  observations  and  the 
fitted  value  Yw  obtained  when  the  z'th  observation  is  deleted: 

(11.61)  U--T(  0 


Cook’s  distance  measure 

An  overall  measure  of  the  impact  of  the  z'th  observation  on  the  estimated 
regression  coefficients  is  Cook’ s  distance  measure  D,.  Recall  from  (7.43)  that 


408  /  Multicollinearity,  influential  observations,  and  other  topics — II 


the  boundary  of  the  confidence  region  for  all  p  regression  coefficients 
(3k  (k  =  0,  1, . . .  ,p  -  1)  is  given  by: 


(11.62) 


(b-  P)TX(b-  3) 

pMSE 


=  F(l  -  a;p,n-  p) 


Cook’s  distance  measure  Dt  uses  the  same  structure  for  measuring  the  combined 
impact  of  the  differences  in  the  estimated  regression  coefficients  when  the  zth 
observation  is  deleted: 


(11.63) 


(b  -  b(0)TX(b  -  b(0) 

pMSE 


While  Dt  does  not  follow  the  F  distribution,  it  has  been  found  useful  to  relate  the 
value  Dj  to  the  corresponding  F  distribution  according  to  (11.62)  and  ascertain 
the  percentile  value.  If  the  percentile  value  is  less  than  about  10  or  20  percent, 
the  zth  observation  has  little  apparent  influence  on  the  fitted  regression  function. 
If,  on  the  other  hand,  the  percentile  value  is  near  50  percent  or  more,  the  distance 
between  the  vectors  b  and  b^  should  be  considered  large,  implying  that  the  zth 
observation  has  a  substantial  influence  on  the  fit  of  the  regression  function. 

Fortunately,  Cook’s  distance  measure  Dt  can  be  calculated  without  fitting  new 
regression  functions  where  the  zth  observation  is  deleted.  An  algebraically  equiv¬ 
alent  expression  is: 


(11.63a) 


Di 


pMSE 


hji 


(1  -  hay 


Note  from  (11.63a)  that  D ,  depends  on  two  factors:  (1)  the  size  of  the  residual  e, 
and  (2)  the  leverage  value  hu.  The  larger  is  either  e,  or  hih  the  larger  is  D,.  Thus, 
the  zth  observation  can  be  influential:  (1)  by  having  a  large  residual  e;-  and  only  a 
moderate  leverage  value  hu,  or  (2)  by  having  a  large  leverage  value  hu  with  only 
a  moderately  sized  residual  e;,  or  (3)  by  having  both  a  large  residual  e,-  and  a 
large  leverage  value  hti. 


Example.  In  the  body  fat  example,  we  had  identified  observations  3  and  15 
as  outlying  X  observations  and  observations  8  and  13  as  outlying  Y  observations. 
We  now  calculate  Cook’s  distance  measure  for  each  of  these  observations.  To 
illustrate  the  calculations,  we  shall  consider  observation  3.  From  Table  11.7,  we 
have  e3  =  —3.176  and  /z33  =  .372;  and  from  Table  8.4a,  we  have  MSE  =  6.47. 
Since  p  =  3,  we  obtain  using  (11.63a): 


(-3.176)2  .372 

3(6.47)  (1  -  ,372)2 


.490 


The  distance  measures  for  all  of  the  observations  are  presented  in  Table  11.7, 
column  4.  We  note  from  column  4  that  observation  3  clearly  is  the  most  influen¬ 
tial  observation,  with  the  next  largest  distance  measure  D  13  =  .211  being  sub¬ 
stantially  smaller. 


1 1 .6  Identification  of  influential  observations  and  remedial  measures  /  409 


To  assess  the  magnitude  of  D3  =  .490,  we  refer  to  the  corresponding  F  distri¬ 
bution,  namely,  F(p,  n  —  p)  =  F( 3,  17).  It  can  be  shown  that  .490  is  about  the 
31st  percentile  of  this  distribution.  Hence,  it  appears  that  observation  3  does 
influence  the  regression  fit,  but  the  extent  of  the  influence  may  not  be  large 
enough  to  call  for  consideration  of  remedial  measures. 

Additional  insights  about  the  extent  of  the  influence  of  observation  3  may  be 
obtained  by  comparing  the  fitted  value  Y3  when  all  observations  are  utilized  with 
the  fitted  value  7(3)  when  observation  3  is  deleted.  It  can  be  shown  that  f3  - 
21.877  and  Y(3)  =  23.756,  so  that  omission  of  observation  3  increases  the  fitted 
value  by  8.6  percent.  This  change  indicates  that  observation  3,  although  it  influ¬ 
ences  the  fitted  regression,  may  not  play  such  a  strong  role  as  to  require  consider¬ 
ation  of  remedial  measures. 


Comments 


1.  Cook’s  distance  measure  £>,  may  be  viewed  as  reflecting  in  the  aggregate  the 
differences  between  the  fitted  values  for  each  observation  when  all  n  observations  are 
used  in  the  data  base  and  the  fitted  values  when  the  ith  observation  is  deleted,  since  it  can 
be  shown  that  an  equivalent  expression  for  Dt  is: 


(11.64) 


D  _  (Y-Y(0y(Y-Y(0) 
'  pMSE 


Here,  Y  as  usual  is  the  vector  of  the  fitted  values  when  all  n  observations  are  used  in  the 
data  base  for  the  regression  fit  and  Y(;)  is  the  vector  of  the  fitted  values  when  the  it h 
observation  is  deleted  from  the  data  base. 

2.  Analysis  of  outlying  and  influential  observations  is  a  necessary  component  of  good 
regression  analysis.  However,  it  is  neither  automatic  nor  foolproof  and  requires  good 
judgment  by  the  analyst.  The  methods  which  have  been  described  often  work  well  but  at 
other  times  will  be  ineffective.  For  example,  if  two  influential  outlying  observations  are 
nearly  coincident,  an  analysis  which  deletes  one  observation  at  a  time  and  estimates  the 
change  in  fit  will  result  in  virtually  no  change  for  these  two  outlying  observations.  The 
reason  is  that  the  retained  outlying  observation  will  mask  the  effect  of  the  deleted  outlying 
observation. 


Remedial  measures 

After  using  the  hat  matrix,  studentized  deleted  residuals,  and  Cook’s  dis¬ 
tances  to  identify  outlying  influential  observations  that  have  a  substantial  impact 
on  the  least  squares  regression  fit,  one  must  decide  what  to  do  about  such  obser¬ 
vations.  Clearly,  an  outlying  influential  observation  should  not  be  automatically 
discarded,  because  it  may  be  entirely  correct  and  simply  represents  an  unlikely 
event.  Discarding  of  such  an  outlying  observation  could  lead  to  the  undesirable 
consequence  of  increased  variances  of  some  of  the  estimated  regression  coeffi¬ 
cients. 

If,  on  the  other  hand,  the  circumstances  surrounding  the  data  provide  an 
explanation  of  the  unusual  observation  which  indicates  an  exceptional  situation 
not  to  be  covered  by  the  model,  the  discarding  of  the  observation  may  be  appro- 


410  /  Multicollinearity,  influential  observations,  and  other  topics — II 


priate.  Thus,  when  an  outlying  influential  observation  can  definitely  be  shown  to 
be  the  result  of  a  gross  measurement  error,  it  would  be  appropriate  to  discard  that 
observation. 

When  the  outlying  influential  observation  is  accurate,  it  may  not  represent  an 
unlikely  event  but  rather  a  failure  of  the  model.  The  failure  may  be  either  the 
omission  of  an  important  independent  variable  or  the  choice  of  an  incorrect 
functional  form,  such  as  omission  of  a  curvature  effect  for  an  independent  varia¬ 
ble  included  in  the  model.  Often,  identification  of  outlying  influential  observa¬ 
tions  leads  to  valuable  insights  for  strengthening  the  model. 

When  an  outlying  influential  observation  is  accurate  but  no  explanation  can  be 
found  for  it,  a  less  severe  alternative  than  discarding  the  observation  is  to  dampen 
its  influence.  We  shall  now  discuss  one  method  of  doing  this. 

Method  of  least  absolute  deviations.  This  method  is  one  of  a  variety  of 
robust  methods  that  have  the  property  of  being  insensitive  to  both  outlying  data 
values  and  inadequacies  of  the  model  employed.  The  method  of  least  absolute 
deviations  estimates  the  regression  coefficients  by  minimizing  the  sum  of  the 
absolute  deviations  of  the  observations  from  their  means.  The  criterion  to  be 
minimized  is: 


(11.65)  2 1  Y>  ~  (A>  +  AXi  +  ■  •  •  +  I 

1  =  1 

Since  absolute  deviations  rather  than  squared  ones  are  involved  here,  the  method 
of  least  absolute  deviations  places  less  emphasis  on  outlying  observations  than 
does  the  method  of  least  squares. 

The  estimated  regression  coefficients  according  to  the  method  of  least  abso¬ 
lute  deviations  can  be  obtained  by  linear  programming  techniques.  Details  about 
the  computational  aspects  may  be  found  in  specialized  texts,  such  as  Reference 
11.2. 

Example.  In  the  body  fat  example,  observation  3  was  identified  as  having  a 
substantial  influence  on  the  fitted  regression  function.  If  observation  3  were 
deleted,  the  fitted  regression  function  would  be: 

Y=  -12.428  +  .564  li^  +  ,3635X2 

An  alternative  to  deleting  observation  3  would  be  to  reduce  its  influence.  Using 
the  method  of  least  absolute  deviations,  the  fitted  regression  function  is: 

Y=  -17.027  +  All3Xl  +  ,5203X2 

Since  the  fitted  regression  based  on  all  observations  is  (Table  8.4a): 

Y  =  -19.174  +  ,2224X1  +  ,6594X2 

it  is  seen  that  the  method  of  least  absolute  deviations  leads  to  more  modest 
changes  than  dropping  observation  3  entirely.  An  analysis  of  the  residuals  shows 


Problems  /  41 1 


that  the  method  of  least  absolute  deviations  resulted  in  reductions  of  those  residu¬ 
als  that  are  largest  absolutely  with  the  method  of  least  squares. 

Comments 

1 .  The  residuals  for  the  method  of  least  absolute  deviations  ordinarily  will  not  sum  to 
zero. 

2.  The  solution  for  the  estimated  regression  coefficients  with  the  method  of  least 
absolute  deviations  may  not  be  unique. 

3.  The  method  of  least  absolute  deviations  is  also  called  minimum  absolute  devia¬ 
tions,  minimum  sum  of  absolute  deviations,  and  minimum  L^-norm. 

4.  Numerous  other  robust  procedures  besides  the  method  of  least  absolute  deviations 
have  been  proposed.  Reference  11.3  discusses  a  number  of  these  procedures. 


PROBLEMS 

11.1.  Refer  to  the  example  of  perfectly  correlated  independent  variables  in  Table  11.1. 

a.  Develop  another  model,  like  models  (11.16)  and  (11.17),  that  fits  the  data 
perfectly. 

b.  What  is  the  intersection  of  the  infinitely  many  response  surfaces  that  fit  the 
data  perfectly? 

11.2.  The  progress  report  of  a  research  analyst  to  the  supervisor  stated:  “All  the  esti¬ 
mated  regression  coefficients  in  our  model  with  three  independent  variables  to 
predict  sales  are  statistically  significant.  Our  new  preliminary  model  with  seven 
independent  variables,  which  includes  the  three  variables  of  our  smaller  model, 
is  less  satisfactory  because  only  two  of  the  seven  regression  coefficients  are 
statistically  significant.  Yet  in  some  initial  trials  the  expanded  model  is  giving 
more  precise  sales  predictions  than  the  smaller  model.  The  reasons  for  this 
anomaly  are  now  being  investigated.”  Comment. 

11.3.  Two  authors  wrote  as  follows:  “Our  research  utilized  a  multiple  regression 
model.  Two  of  the  independent  variables  important  in  our  theory  turned  out  to  be 
highly  correlated  in  our  observations.  This  made  it  difficult  to  assess  the  individ¬ 
ual  effects  of  each  of  these  variables  separately.  We  retained  both  variables  in 
our  model,  however,  because  the  high  coefficient  of  multiple  determination 
made  this  difficulty  unimportant.”  Comment. 

11.4.  A  student  asked:  “Why  is  it  necessary  to  perform  diagnostic  checks  of  the  fit 
when  R 2  is  large?”  Comment. 

11.5.  Cosmetics  sales.  An  assistant  in  the  district  sales  office  of  a  national  cosmetics 
firm  obtained  data,  shown  below,  on  advertising  expenditures  and  sales  last  year 
in  the  district’s  14  territories.  denotes  expenditures  for  point-of-sale  displays 
in  beauty  salons  and  department  stores  (in  thousand  dollars)  while  X2  and  Z3 
represent  the  corresponding  expenditures  for  local  media  advertising  and  pro¬ 
rated  share  of  national  media  advertising,  respectively.  Y  denotes  sales  (in'thou- 
sand  cases).  The  assistant  was  instructed  to  estimate  the  increase  in  expected 
sales  when  X\  is  increased  by  one  thousand  dollars  and  X2  and  X3  are  held 
constant,  and  was  told  to  use  an  ordinary  multiple  regression  model  with  linear 
terms  for  the  independent  variables  and  with  independent  normal  error  terms. 


412  /  Multicollinearity,  influential  observations,  and  other  topics — II 


i: 

1 

2 

3 

4 

5 

6 

7 

Xn- 

4.2 

6.5 

3.0 

2.1 

2.9 

7.2 

4.8 

Xi2: 

4.0 

6.5 

3.5 

2.0 

3.0 

7.0 

5.0 

XB: 

3.0 

5.0 

4.0 

3.0 

4.0 

3.0 

4.5 

Yr 

8.26 

14.70 

9.73 

5.62 

7.84 

12.18 

8.56 

i : 

8 

9 

10 

11 

12 

13 

14 

Xn: 

4.3 

2.6 

3.1 

6.2 

5.5 

2.2 

3.0 

Xa: 

4.0 

2.5 

3.0 

6.0 

5.5 

2.0 

2.8 

X,3: 

5.0 

5.0 

4.0 

4.5 

5.0 

4.0 

3.0 

Yi- 

10.77 

7.56 

8.90 

12.51 

10.46 

7.15 

6.74 

a.  State  the  regression  model  to  be  employed  and  fit  it  to  the  data. 

b.  Test  whether  there  is  a  regression  relation  between  sales  and  the  three  inde¬ 
pendent  variables.  Use  a  level  of  significance  of  .05.  State  the  alternatives, 
decision  rule,  and  conclusion. 

c.  Test  for  each  of  the  regression  coefficients  (3k  (k  =  1,  2,  3)  individually 
whether  or  not  (3k  =  0.  Use  a  level  of  significance  of  .05  each  time.  Do  the 
conclusions  of  these  tests  correspond  to  that  obtained  in  part  (b)? 

d.  Obtain  the  correlation  matrix. 

e.  What  do  the  results  in  parts  (b),  (c),  and  (d)  suggest  about  the  suitability  of 
the  data  for  the  research  objective? 

11.6.  Refer  to  Cosmetics  sales  Problem  11.5. 

a.  Verify  that  the  variance  inflation  factor  for  variable  X{  is  (VfF)l  =  66.29. 
The  other  variance  inflation  factors  are  ( VIF)2  =  66.99  and  ( VIF)3  =  1.09. 
What  do  these  suggest  about  the  effects  of  multicollinearity  here? 

b.  The  assistant  eventually  decided  to  drop  variables  X2  and  X3  from  the  model 
“to  clear  up  the  picture.”  Fit  the  assistant’s  revised  model.  Is  the  assistant 
now  in  a  better  position  to  achieve  the  research  objective? 

c.  Why  would  an  experiment  here  be  more  effective  in  providing  suitable  data 
to  meet  the  research  objective?  How  would  you  design  such  an  experiment? 
What  model  would  you  employ? 

11.7.  Refer  to  Patient  satisfaction  Problem  7.17. 

a.  Obtain  the  correlation  matrix.  What  does  it  show  about  pairwise  linear  asso¬ 
ciations  among  the  independent  variables? 

b.  The  variance  inflation  factors  are  (V7F)i  =  1.35,  (V!F)2  =  2.76,  and 
(VIF)2  —  2.87.  What  do  these  results  suggest  about  the  effects  of  multicol¬ 
linearity  here?  Are  these  results  more  revealing  than  those  in  part  (a)? 

11.8.  Refer  to  Brand  preference  Problem  7.8. 

a.  Obtain  the  correlation  matrix.  What  does  it  show  about  the  linear  associa¬ 
tion  between  the  two  independent  variables? 

b.  Find  the  two  variance  inflation  factors.  Why  are  they  both  equal  to  1? 

11.9.  Refer  to  Mathematicians  salaries  Problem  7.20. 

a.  Obtain  the  correlation  matrix.  What  does  it  show  about  the  pairwise  linear 
associations  among  the  independent  variables? 

b.  Obtain  the  variance  inflation  factors.  Do  they  indicate  that  a  serious  multi¬ 
collinearity  problem  exists  here? 

11.10.  Refer  to  Cosmetics  sales  Problem  11.5.  Given  below  are  the  estimated  ridge 


Problems  /  413 


standardized  regression  coefficients,  the  variance  inflation  factors,  and  R2  for 
selected  biasing  constants  c. 


c 

.000 

.005 

.01 

.02 

.03 

.04 

.05 

.06 

ft? 

.273 

.327 

.349 

.368 

.376 

.380 

.382 

.383 

ft? 

.549 

.494 

.470 

.447 

.435 

.427 

.422 

.417 

ft? 

.260 

.260 

.260 

.259 

.257 

.256 

.254 

.253 

(.viF)! 

66.29 

24.11 

12.45 

5.20 

2.92 

1.91 

1.38 

1.07 

( VIFh 

66.99 

24.36 

12.57 

5.25 

2.94 

1.92 

1.39 

1.07 

C VIFh 

1.09 

1.06 

1.04 

1.01 

.99 

.97 

.95 

.93 

R2 

.840 

.838 

.836 

.832 

.828 

.824 

.821 

.816 

a.  Make  a  ridge  trace  plot  for  the  given  c  values.  Do  the  ridge  regression 
coefficients  exhibit  substantial  changes  near  c  =  0? 

b.  Suggest  a  reasonable  value  for  the  biasing  constant  c  based  on  the  ridge 
trace,  the  (VTF)’s,  and  R2. 

c.  Transform  the  estimated  standardized  regression  coefficients  selected  in  part 
(b)  back  to  the  original  variables  and  obtain  the  fitted  values  for  the  14 
observations.  How  similar  are  these  fitted  values  to  those  obtained  with  the 
ordinary  least  squares  fit  in  Problem  11.5  a? 

11.11.  Refer  to  Chemical  shipment  Problem  7.12.  Given  below  are  the  estimated  ridge 
standardized  regression  coefficients,  variance  inflation  factors,  and  R2  for  se¬ 
lected  biasing  constants  c. 


c: 

.000 

.005 

.01 

.05 

.07 

.09 

.10 

.20 

ft?: 

.451 

.453 

.455 

.460 

.460 

.459 

.458 

.444 

bR2 : 

.561 

.556 

.552 

.526 

.517 

.508 

.504 

.473 

C VIF ),  =  ( VIF)2 : 

7.03 

6.20 

5.51 

2.65 

2.03 

1.61 

1.46 

.71 

R2: 

.987 

.984 

.982 

.962 

.952 

.943 

.940 

.894 

a.  Make  a  ridge  trace  plot  for  the  given  c  values.  Do  the  regression  coefficients 
exhibit  substantial  changes  near  c  =  0? 

b.  Why  are  the  (VIF)i  values  the  same  as  the  (VIF)2  values  here? 

c .  Suggest  a  reasonable  value  for  the  biasing  constant  c  based  on  the  ridge  trace 
in  part  (a),  the  (VTF)’s,  and  R2. 

d.  Transform  the  estimated  standardized  regression  coefficients  selected  in  part 
(c)  back  to  the  original  variables  and  obtain  the  fitted  values  for  the  20 
observations.  How  similar  are  these  fitted  values  to  those  obtained  with  the 
ordinary  least  squares  fit  in  Problem  7.12a? 

11.12.  Refer  to  Brand  preference  Problem  7.8.  The  diagonal  elements  of  the  hat  ma¬ 
trix  are:  h55  =  h66  =  h17  =  =  h99  =  h1QAQ  =  hu,n  =  *12,12  =  ■  137  and 

^11  =  h22  =  ^33  =  ^44  =  ^13,13  ~  ^14,14  =  ^15,15  ~  ^16,16  “  -237. 

a.  Explain  the  reason  for  the  pattern  in  the  diagonal  elements  of  the  hat  matrix. 

b.  According  to  the  rule  of  thumb  stated  in  the  chapter,  are  any  of  the  observa¬ 
tions  outlying  with  regard  to  their  X  values? 

c.  Obtain  the  studentized  deleted  residuals  and  identify  any  outlying  Y  obser¬ 
vations  . 

d.  Calculate  Cook’s  distance  D,  for  each  observation.  Are  any  observations 
influential  according  to  this  measure? 

11.13.  Refer  to  Chemical  shipment  Problem  7.12.  The  diagonal  elements  of  the  hat 

matrix  are: 


414  /  Multicollinearity,  influential  observations,  and  other  topics — II 


i: 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

hit- 

.091 

.194 

.131 

.268 

.149 

.141 

.429 

.067 

.135 

.165 

V. 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

hu\ 

.179 

.051 

.110 

.156 

.095 

.128 

.097 

.230 

.112 

.073 

a.  Identify  any  outlying  X  observations  using  the  rule  of  thumb  presented  in  the 
chapter. 

b.  Obtain  the  studentized  deleted  residuals  and  identify  any  outlying  Y  obser¬ 
vations  . 

c.  Calculate  Cook’s  distance  D,  for  each  observation.  Are  any  observations 
influential  according  to  this  measure?  How  much  is  the  fitted  value  for 
observation  i  —  1  changed  when  all  observations  are  included  and  when 
observation  7  is  omitted  from  the  fit?  [Hint:  Y(i)  =  Yt  —  e,-/(l  —  hu).\ 

11.14.  Refer  to  Patient  satisfaction  Problem  7.17.  The  diagonal  elements  of  the  hat 
matrix  are: 


i: 

l 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

hn: 

.134 

.193 

.070 

.235 

.204 

.319 

.060 

.174 

.339 

.104 

.209 

.143 

i : 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

.078 

.231 

.158 

.238 

.059 

.057 

.254 

.137 

.313 

.072 

.230 

a.  Identify  any  outlying  X  observations. 

b.  Obtain  the  studentized  deleted  residuals  and  identify  any  outlying  Y  obser¬ 
vations. 

c.  Calculate  Cook’s  distance  for  each  observation.  Are  any  observations 
influential  according  to  this  measure? 

11.15.  Refer  to  Mathematicians  salaries  Problem  7.20.  The  diagonal  elements  of  the 
hat  matrix  are: 


i: 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

hi. 

.230 

.060 

.141 

.070 

.221 

.119 

.103 

.180 

.233 

.297 

.088 

.131 

i: 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

hu: 

.287 

.100 

.199 

.165 

.213 

.156 

.187 

.208 

.120 

.135 

.225 

.131 

a.  Identify  any  outlying  X  observations. 

b.  Obtain  the  studentized  deleted  residuals  and  identify  any  outlying  Y  obser¬ 
vations. 

c.  Calculate  Cook’s  distance  Dt  for  each  observation.  Are  any  observations 
influential  according  to  this  measure? 


EXERCISES 

11.16.  Refer  to  the  work  crew  productivity  example  data  in  Table  8.1. 

a.  For  the  variables  transformed  according  to  (1 1 .6),  obtain:  (1)  X'X,  (2)  X'Y, 
(3)  b,  (4)  s2(b). 


Projects  /  41 5 


b.  Show  that  the  regression  coefficients  obtained  in  part  (a3)  are  the  same  as 
the  standardized  regression  coefficients  according  to  (7.69). 

11.17.  Show  that  the  least  squares  estimator  of  /%  in  (11.2a)  is  Y. 

11.18.  Derive  the  relations  between  the  fik  and  j5'k  ( k  =  1,  2)  in  (11.8a)  and  (11.8b). 

11.19.  Derive  the  expression  for  X'Y  in  (11.12)  for  the  transformed  model  (11.7). 

11.20.  Derive  the  mean  squared  error  in  (11.35). 

11.21.  Refer  to  the  least  absolute  deviations  estimates  for  the  body  fat  example  on  page 

410 — namely,  bo  =  —17.027,  b\  —  .4173,  and  b2  =  .5203. 

a.  Find  the  sum  of  the  absolute  deviations  of  the  sample  observations  from  the 
fitted  values  based  on  the  least  absolute  deviations  estimates. 

b.  For  the  least  squares  estimated  regression  coefficients  b0  =  —19.174, 
b\  =  .2224,  and  b2  =  .6594,  find  the  sum  of  the  absolute  deviations.  Is  this 
sum  larger  than  the  sum  obtained  in  part  (a)? 

PROJECTS 

11.22.  Refer  to  Patient  satisfaction  Problem  7.17. 

a.  Obtain  the  estimated  ridge  standardized  regression  coefficients,  variance 
inflation  factors,  and  R2  for  the  following  biasing  constants:  c  -  .000,  .005, 
.01,  .02,  .03,  .04,  .05. 

b.  Make  a  ridge  trace  plot  for  the  given  c  values.  Do  the  ridge  regression 
coefficients  exhibit  substantial  changes  near  c  =  0? 

c.  Suggest  a  reasonable  value  for  the  biasing  constant  c  based  on  the  ridge 
trace,  the  (V7F)’s,  and  R2. 

d.  Transform  the  estimated  standardized  regression  coefficients  selected  in  part 
(c)  back  to  the  original  variables  and  obtain  the  fitted  values  for  the  23 
observations.  How  similar  are  these  fitted  values  to  those  obtained  with  the 
ordinary  least  squares  fit  in  Problem  7.17a? 

11.23.  Refer  to  Mathematicians  salaries  Problem  7.20. 

a.  Obtain  the  estimated  ridge  standardized  regression  coefficients,  variance 
inflation  factors,  and  R2  for  the  following  biasing  constants:  c  —  .000,  .005, 
.01,  .02,  .03,  .04,  .05. 

b.  Make  a  ridge  trace  plot  for  the  given  c  values.  Do  the  ridge  regression 
coefficients  exhibit  substantial  changes  near  c  =  0? 

c.  Suggest  a  reasonable  value  for  the  biasing  constant  c  based  on  the  ridge 
trace,  the  (V/F)’s,  and  R2. 

d.  Transform  the  estimated  standardized  regression  coefficients  selected  in  part 
(c)  back  to  the  original  variables  and  obtain  the  fitted  values  for  the  24 
observations.  How  similar  are  these  fitted  values  to  those  obtained  with  the 
ordinary  least  squares  fit  in  Problem  7.20a? 

11.24.  Refer  to  the  SENIC  data  set. 

a.  Regress  the  logarithm  of  length  of  stay  ( Y' )  on  infection  risk  (Xx),  number 
of  beds  (X2),  and  average  daily  census  (X3). 

b.  Obtain  the  residuals  and  identify  outliers. 


416  /  Multicollinearity,  influential  observations,  and  other  topics — II 


c.  Obtain  the  correlation  matrix  and  the  variance  inflation  factors.  What  do 
these  suggest  about  the  effects  of  multicollinearity? 

d.  Obtain  the  estimated  ridge  regression  coefficients,  variance  inflation  fac¬ 
tors,  and  R2  for  the  values  of  the  biasing  constant  c  given  in  Table  11.6. 

e.  Make  a  ridge  trace  plot  and  determine  a  reasonable  value  for  the  biasing 
constant  c  based  on  this  plot,  the  (V7T)’s,  and  R2. 

11.25.  Refer  to  the  SMSA  data  set. 

a.  Regress  number  of  active  physicians  (F)  on  number  of  hospital  beds  (Y  () , 
total  personal  income  ( X2 ),  and  total  serious  crimes  (Xf) . 

b.  Obtain  the  residuals  and  identify  outliers. 

c.  Obtain  the  correlation  matrix  and  the  variance  inflation  factors.  What  do 
these  suggest  about  the  effects  of  multicollinearity? 

d.  Obtain  the  estimated  ridge  regression  coefficients,  variance  inflation  fac¬ 
tors,  and  R2  for  the  values  of  the  biasing  constant  c  in  Table  11.6. 

e.  Make  a  ridge  trace  plot  and  determine  a  reasonable  value  for  the  biasing 
constant  c  based  on  this  plot,  the  (VIFfs,  and  R2. 


CITED  REFERENCES 

11.1  Belsley,  David  A.;  Edwin  Kuh;  and  Roy  E.  Welsch.  Regression  Diagnostics: 
Identifying  Influential  Data  and  Sources  of  Collinearity .  New  York:  John  Wiley  & 
Sons,  1980. 

11.2  Kennedy,  William  J. ,  Jr. ,  and  James  E.  Gentle.  Statistical  Computing.  New  York: 
Marcel  Dekker,  1980. 

11.3  Hogg,  Robert  V.  “Statistical  Robustness:  One  View  of  Its  Use  in  Applications 
Today.”  The  American  Statistician  33  (1979),  pp.  108-15. 


12 


Selection  of  independent  variables 


One  of  the  most  difficult  problems  in  regression  analysis  often  is  the  selection 
of  the  set  of  independent  variables  to  be  employed  in  the  model.  In  this  chapter, 
we  take  up  several  computer-assisted  search  methods  for  helping  to  identify  one 
or  a  number  of  possible  sets  of  independent  variables  to  be  included  in  the 
regression  model. 

12.1  NATURE  OF  PROBLEM 

As  we  have  seen  in  previous  chapters,  regression  analysis  has  three  major 
uses:  (1)  description,  (2)  control,  and  (3)  prediction.  For  each  of  these  uses,  the 
investigator  must  specify  the  set  of  independent  variables  to  be  employed  for 
describing,  controlling,  and/or  predicting  the  dependent  variable. 

In  some  fields,  theory  can  aid  in  selecting  the  independent  variables  to  be 
employed  and  in  specifying  the  functional  form  of  the  regression  relation.  Often 
in  these  fields,  controlled  experiments  can  be  undertaken  to  furnish  data  on  the 
basis  of  which  the  regression  parameters  can  be  estimated  and  the  theoretical 
form  of  the  regression  function  tested. 

In  many  other  subject  matter  fields,  however,  including  the  social  and  behav¬ 
ioral  sciences  and  management,  serviceable  theoretical  models  are  relatively 
rare.  To  complicate  matters  further,  the  available  theoretical  models  may  involve 

417 


418  /  Selection  of  independent  variables 


independent  variables  that  are  not  directly  measurable,  such  as  a  family’s  future 
earnings  over  the  next  10  years.  Under  these  conditions,  investigators  are  often 
forced  to  prospect  for  independent  variables  that  could  conceivably  be  related  to 
the  dependent  variable  under  study.  Obviously,  such  a  set  of  independent  varia¬ 
bles  will  be  large.  For  example,  a  company’s  sales  of  portable  dishwashers  in  a 
district  may  be  affected  by  population  size,  per  capita  income,  percent  of  popula¬ 
tion  in  urban  areas,  percent  of  population  under  50  years  old,  percent  of  families 
with  children  at  home,  etc.  etc.! 

After  such  a  lengthy  list  has  been  compiled,  some  of  the  independent  varia¬ 
bles  can  be  screened  out.  An  independent  variable  (1)  may  not  be  fundamental  to 
the  problem,  (2)  may  be  subject  to  large  measurement  errors,  and/or  (3)  may 
effectively  duplicate  another  independent  variable  in  the  list.  Other  independent 
variables  that  cannot  be  measured  may  either  be  deleted  or  replaced  by  proxy 
variables  that  are  highly  correlated  with  them. 

Typically,  the  number  of  independent  variables  that  remain  after  this  initial 
screening  is  still  large.  Further,  many  of  these  variables  will  be  highly  inter- 
correlated.  Hence,  the  investigator  usually  will  wish  to  reduce  the  number  of 
independent  variables  to  be  used  in  the  final  model.  There  are  several  reasons  for 
this.  A  regression  model  with  a  large  number  of  independent  variables  is  expen¬ 
sive  to  maintain.  Further,  regression  models  with  a  limited  number  of  independ¬ 
ent  variables  are  easier  to  analyze  and  understand.  Finally,  the  presence  of  many 
highly  intercorrelated  independent  variables  may  add  little  to  the  predictive 
power  of  the  model  while  substantially  increasing  the  sampling  variation  of  the 
regression  coefficients,  detracting  from  the  model’s  descriptive  abilities,  and 
increasing  the  problem  of  roundoff  errors  (as  we  have  seen  in  Chapter  11). 

The  investigator  must  be  careful,  however,  not  to  eliminate  key  explanatory 
variables  because  that  could  seriously  damage  the  explanatory  power  of  the 
model  and  lead  to  biased  estimates  of  regression  coefficients,  mean  responses, 
and  predictions  of  new  observations. 

The  problem  then  is  how  to  shorten  the  list  of  independent  variables  so  as  to 
obtain,  in  some  sense,  a  “good”  selection  of  independent  variables.  This  subset 
of  independent  variables  needs  to  be  small  enough  so  that  maintenance  costs  are 
manageable  and  analysis  is  facilitated,  yet  it  must  be  large  enough  so  that  ade¬ 
quate  description,  control,  or  prediction  is  possible. 

Since  the  purposes  of  regression  analysis  vary,  no  one  subset  of  independent 
variables  is  usually  “best”  for  all  uses.  For  instance,  descriptive  uses  of  a  regres¬ 
sion  model  typically  will  emphasize  precise  estimation  of  the  regression  coeffi¬ 
cients,  while  predictive  uses  will  focus  on  the  prediction  errors.  Often,  different 
subsets  of  the  potential  independent  variables  will  best  serve  these  varying  pur¬ 
poses.  Even  for  a  given  purpose,  it  is  often  found  that  several  subsets  are  about 
equally  “good”  according  to  a  given  criterion,  and  the  choice  of  the  subset  of 
independent  variables  to  be  employed  in  the  regression  model  needs  to  be  made 
on  the  basis  of  additional  considerations.  The  entire  selection  process  is,  and 
should  be,  pragmatic,  with  large  doses  of  subjective  judgment.  The  selection 
procedures  to  be  discussed  in  this  chapter  are  aids  to  the  investigator’s  judgment 


12.2  Example  /  419 


and  should  not  be  used  in  a  purely  mechanical  fashion.  A  mechanical  approach, 
for  instance,  might  omit  an  important  independent  variable  just  because  it  oc¬ 
curred  in  the  sample  within  a  narrow  range  of  values  and  therefore  turned  out  to 
be  statistically  nonsignificant. 

We  shall  discuss  in  this  chapter  two  approaches  to  the  selection  of  independ¬ 
ent  variables.  The  first  approach  considers  all  possible  regression  models  that  can 
be  developed  from  the  pool  of  potential  independent  variables  and  identifies 
subsets  of  the  independent  variables  which  are  “good”  according  to  a  criterion 
specified  by  the  investigator.  The  second  approach  employs  automatic  search 
procedures  to  arrive  at  a  single  subset  of  the  independent  variables. 

The  choice  of  the  independent  variables  to  be  employed  in  the  regression 
model  does  not,  of  course,  fully  determine  the  model  to  be  utilized.  Other  deter¬ 
minations  must  also  be  made,  such  as  whether  an  independent  variable  appears  in 
linear  form,  in  a  transformed  fashion,  or  with  a  quadratic  term  added,  and 
whether  interaction  terms  should  be  included  in  the  model.  Our  discussion  of  the 
choice  of  independent  variables  in  this  chapter  assumes  that  the  investigator  has 
already  considered  the  functional  form  of  the  regression  relation  (whether  given 
variables  are  to  appear  in  linear  form,  quadratic  form,  etc.),  whether  the  inde¬ 
pendent  variables  or  the  dependent  variable  are  first  transformed  (e.g.,  by  a 
logarithmic  transformation),  and  whether  any  interaction  terms  are  to  be  in¬ 
cluded.  At  this  point,  a  selection  procedure  is  employed  to  reduce  the  number  of 
X  variables,  which  include  not  only  the  potential  independent  variables  in  first- 
order  form  but  also  quadratic  and  other  curvature  terms  and  interaction  terms. 


Note 


All  too  often,  unwary  investigators  will  screen  the  set  of  independent  variables  by 
fitting  the  regression  model  containing  the  entire  set  of  potential  X  variables  and  then 
simply  dropping  those  for  which  the  t*  statistic  (8.25): 


(12.1) 


— 
h  ~ 


h 

s(bk) 


has  a  small  absolute  value.  As  we  know  from  Chapter  11,  this  procedure  can  lead  to  the 
dropping  of  important  intercorrelated  independent  variables.  Clearly,  a  good  search  pro¬ 
cedure  must  be  able  to  handle  important  intercorrelated  independent  variables  in  such  a 
way  that  not  all  of  them  will  be  dropped. 


12.2  EXAMPLE 

In  order  to  illustrate  the  selection  procedures  to  be  discussed  in  the  following 
sections,  we  shall  use  a  relatively  simple  example  which  has  four  potential  inde¬ 
pendent  variables.  By  limiting  the  number  of  potential  independent  variables,  we 
shall  be  able  to  explain  the  selection  procedures  without  overwhelming  the  reader 
with  masses  of  computer  printouts. 

A  hospital  surgical  unit  was  interested  in  predicting  survival  in  patients  under- 


TABLE  12.1  Potential  independent  variables  and  dependent  variable — surgical  unit 
example 


Case 

Number 

Blood  Clotting 
Score 

X, 

Prognostic 

Index 

x2 

Enzyme 

Function 

Test 

x3 

Liver 

Function 

Test 

X4 

Survival 

Time 

Y 

V'  =  logio  Y 

1 

6.7 

62 

81 

2.59 

200 

2.3010 

2 

5.1 

59 

66 

1.70 

101 

2.0043 

3 

7.4 

57 

83 

2.16 

204 

2.3096 

4 

6.5 

73 

41 

2.01 

101 

2.0043 

5 

7.8 

65 

115 

4.30 

509 

2.7067 

6 

5.8 

38 

72 

1.42 

80 

1.9031 

7 

5.7 

46 

63 

1.91 

80 

1.9031 

8 

3.7 

68 

81 

2.57 

127 

2.1038 

9 

6.0 

67 

93 

2.50 

202 

2.3054 

10 

3.7 

76 

94 

2.40 

203 

2.3075 

11 

6.3 

84 

83 

4.13 

329 

2.5172 

12 

6.7 

51 

43 

1.86 

65 

1.8129 

13 

5.8 

96 

114 

3.95 

830 

2.9191 

14 

5.8 

83 

88 

3.95 

330 

2.5185 

15 

7.7 

62 

67 

3.40 

168 

2.2253 

16 

7.4 

74 

68 

2.40 

217 

2.3365 

17 

6.0 

85 

28 

2.98 

87 

1.9395 

18 

3.7 

51 

41 

1.55 

34 

1.5315 

19 

7.3 

68 

74 

3.56 

215 

2.3324 

20 

5.6 

57 

87 

3.02 

172 

2.2355 

21 

5.2 

52 

76 

2.85 

109 

2.0374 

22 

3.4 

83 

53 

1.12 

136 

2.1335 

23 

6.7 

26 

68 

2.10 

70 

1.8451 

24 

5.8 

67 

86 

3.40 

220 

2.3424 

25 

6.3 

59 

100 

2.95 

276 

2.4409 

26 

5.8 

61 

73 

3.50 

144 

2.1584 

27 

5.2 

52 

86 

2.45 

181 

2.2577 

28 

11.2 

76 

90 

5.59 

574 

2.7589 

29 

5.2 

54 

56 

2.71 

72 

1.8573 

30 

5.8 

76 

59 

2.58 

178 

2.2504 

31 

3.2 

64 

65 

0.74  , 

71 

1.8513 

32 

8.7 

45 

23 

2.52 

58 

1.7634 

33 

5.0 

59 

73 

3.50 

116 

2.0645 

34 

5.8 

72 

93 

3.30 

295 

2.4698 

35 

5.4 

58 

70 

2.64 

115 

2.0607 

36 

5.3 

51 

99 

2.60 

184 

2.2648 

37 

2.6 

74 

86 

2.05 

118 

2.0719 

38 

4.3 

8 

119 

2.85 

120 

2.0792 

39 

4.8 

61 

76 

2.45 

151 

2.1790 

40 

5.4 

52 

88 

1.81 

148 

2.1703 

41 

5.2 

49 

72 

1.84 

95 

1.9777 

42 

3.6 

28 

99 

1.30 

75 

1.8751 

43 

8.8 

86 

88 

6.40 

483 

2.6840 

44 

6.5 

56 

77 

2.85 

153 

2.1847 

45 

3.4 

77 

93 

1.48 

191 

2.2810 

46 

6.5 

40 

84 

3.00 

123 

2.0899 

47 

4.5 

73 

106 

3.05 

311 

2.4928 

48 

4.8 

86 

101 

4.10 

398 

2.5999 

49 

5.1 

67 

77 

2.86 

158 

2.1987 

50 

3.9 

82 

103 

4.55 

310 

2.4914 

51 

6.6 

77 

46 

1.95 

124 

2.0934 

52 

6.4 

85 

40 

1.21 

125 

2.0969 

53 

6.4 

59 

85 

2.33 

198 

2.2967 

54 

8.8 

78 

72 

3.20 

313 

2.4955 

12.3  All  possible  regression  models  /  421 


going  a  particular  type  of  liver  operation.  A  random  selection  of  54  patients  was 
available  for  analysis.  From  each  patient  record,  the  following  information  was 
extracted  from  the  preoperational  evaluation: 

X\  blood  clotting  score 

X2  prognostic  index,  which  includes  the  age  of  patient 

X3  enzyme  function  test  score 

X4  liver  function  test  score 

These  constitute  the  potential  independent  variables  for  a  predictive  regression 
model.  The  dependent  variable  was  survival  time,  which  was  ascertained  in  a 
follow-up  study.  The  data  on  the  potential  independent  variables  and  the  depend¬ 
ent  variable  are  presented  in  Table  12.1.  Since  the  survival  time  distribution  is 
substantially  skewed  to  the  right,  the  logarithm  of  the  survival  time  Y'  =  log10  Y 
was  taken  as  the  dependent  variable. 

The  surgical  unit  wished  to  obtain  a  subset  of  independent  variables  for  pre¬ 
dicting  Y' .  The  task  was  assigned  to  an  analyst,  who  first  obtained  the  correlation 
matrix  for  all  the  variables  from  a  computer  run.  This  matrix  provides  valuable 
basic  information  on  the  nature  of  the  problems  to  be  encountered.  Table  12.2 
contains  the  correlation  matrix  based  on  a  computer  run,  omitting  the  duplicate 
terms  below  the  main  diagonal. 

Table  12.2  indicates  that  all  of  the  independent  variables  are  linearly  associ¬ 
ated  with  7',  X4  showing  the  highest  degree  of  association  and  X\  the  lowest. 
The  correlation  matrix  further  shows  intercorrelations  among  the  potential  inde¬ 
pendent  variables.  In  particular,  the  individual  pairwise  correlations  between  X4 
andXl5  X2,  and  X3  are  moderately  high.  The  task  now  is  to  determine  whether  all 
four  independent  variables  are  required  for  the  prediction  model. 


TABLE  12.2  Correlation  matrix  for  surgical  unit  example 


Y' 

x, 

x2 

x3 

*4 

Y' 

1.000 

.346 

.593 

.665 

.726 

Xi 

1.000 

.090 

-.150 

.502 

x2 

1.000 

-.024 

.369 

x3 

1.000 

.416 

X* 

1.000 

12.3  ALL  POSSIBLE  REGRESSION  MODELS 

The  all-possible-regressions  selection  procedure  calls  for  an  examination  of 
all  possible  regression  models  involving  the  potential  X  variables  and  identifying 
“good”  subsets  according  to  some  criterion.  When  this  selection  procedure  is 
used  with  our  surgical  unit  example,  for  instance,  16  different  regression  models 
are  to  be  considered,  as  shown  in  Table  12.3.  First,  there  is  the  regression  model 
with  no  X  variables,  i.e.,  the  model  Yt  =  (B0  +  £,-.  Then  there  are  the  regression 


422  /  Selection  of  independent  variables 


models  with  one X  variable  (Xl5  X2,  X3,  X4),  with  two X  variables  (Xx  andX2,  Xi 
and  X3,  X i  and  X4,  X2  and  X3,  X2  and  X4,  X3  and  X4),  and  so  on. 

Different  criteria  for  comparing  the  various  regression  models  may  be  used 
with  the  all-possible-regressions  selection  procedure.  We  shall  discuss  three — 
Rp,  MSEp,  and  Cp.  Before  doing  so,  we  need  to  develop  some  notation.  Let  us 
denote  the  number  of  potential  X  variables  in  the  pool  by  P  —  1 .  We  assume 
throughout  this  chapter  that  all  regression  models  contain  an  intercept  term  (30. 
Hence,  the  regression  function  containing  all  potential  X  variables  contains  P 
parameters,  and  the  function  with  no  X  variables  contains  one  parameter  (/30) . 

The  number  of  X  variables  in  a  subset  will  be  denoted  by  p  —  1 ,  as  always,  so 
that  there  are  p  parameters  in  the  regression  function  for  this  subset  of  X  varia¬ 
bles.  Thus,  we  have: 

(12.2)  \<p<P 

The  all-possible-regressions  approach  assumes  that  the  number  of  observa¬ 
tions  n  exceeds  the  maximum  number  of  potential  parameters: 

(12.3)  n>P 

and,  indeed,  it  is  highly  desirable  that  n  be  substantially  larger  than  P  so  that 
sound  results  can  be  obtained. 


Rp  criterion 

The  Rp  criterion  calls  for  an  examination  of  the  coefficient  of  multiple  deter¬ 
mination  R2,  defined  in  (7.31),  in  order  to  select  one  or  several  subsets  of  X 
variables.  We  show  the  number  of  parameters  in  the  regression  model  as  a  sub¬ 
script  of  R2.  Thus,  Rp  indicates  that  there  are  p  parameters,  or  p  —  1  predictor 
variables,  in  the  regression  equation  on  which  R2  is  based. 

Since  R2  is  a  ratio  of  sums  of  squares: 


(12.4) 


SSRP  SSE„ 

- 2-  =  1 - E~ 

SSTO  SSTO 


and  the  denominator  is  constant  for  all  possible  regressions,  Rp  varies  inversely 
with  the  error  sums  of  squares  SSEp.  But  we  know  that  SSEp  can  never  increase 
as  additional  independent  variables  are  included  in  the  model.  Thus,  R2  will  be  a 
maximum  when  all  P  —  1  potential  X  variables  are  included  in  the  regression 
model.  The  reason  for  using  the  R2  criterion  with  the  all-possible-regressions 
approach  therefore  cannot  be  to  maximize  R2.  Rather,  the  intent  is  to  find  the 
point  where  adding  more  X  variables  is  not  worthwhile  because  it  leads  to  a  very 
small  increase  in  R2.  Often,  this  point  is  reached  when  only  a  limited  number  of 
X  variables  is  included  in  the  regression  model.  Clearly,  the  determination  of 
where  diminishing  returns  set  in  is  a  judgmental  one. 


Example.  Table  12.3,  column  4,  contains  the  R2  values  for  all  possible 
regression  models  for  our  surgical  unit  example.  The  data  were  obtained  from  a 
series  of  computer  runs.  For  instance,  when  X4  is  the  only  X  variable  in  the 


12.3  AN  possible  regression  models  /  423 


TABLE  12.3  Rp,  MSEP,  and  Cp  values  for  all  possible  regression  models — 
surgical  unit  example 


X 

Variables 
in  Model 

(1) 

p 

(2) 

df 

(3) 

SSEP 

(4) 

R2P 

(5) 

MSEP 

(6) 

cp 

None 

1 

53 

3.9728 

0 

.0750 

1,722.6 

Xi 

2 

52 

3.4960 

.120 

.0672 

1,510.7 

x2 

2 

52 

2.5762 

.352 

.0495 

1,100.1 

x3 

2 

52 

2.2154 

.442 

.0426 

939.0 

x4 

2 

52 

1.8777 

.527 

.0361 

788.3 

XUX2 

3 

51 

2.2324 

.438 

.0438 

948.6 

XUX3 

3 

51 

1.4073 

.646 

mi  6 

580.3 

X„X4 

3 

51 

1.8759 

.528 

.0368 

789.5 

X2,x3 

3 

51 

.7431 

.813 

.0146 

283.7 

X2,  x4 

3 

51 

1.3922 

.650 

.0273 

573.5 

x3,x4 

3 

51 

1.2455 

.687 

.0244 

508.0 

x1,x2,x3 

4 

50 

.1099 

.972 

.00220 

3.1 

xux2,x4 

4 

50 

1.3905 

.650 

.0278 

574.8 

X1,x3,x4 

4 

50 

1.1157 

.719 

.0223 

452.1 

X2,X3,X4 

4 

50 

.4653 

.883 

.00931 

161.7 

X\ ,  X2,  X3,  X4 

5 

49 

.1098 

.972 

.00224 

5.0 

regression  model,  we  obtain: 

9  SSE(X4 )  1.8777 

R\  =  1 - =  1 - =  .527 

SSTO  3.9728 

Note  that  SSTO  =  SSEl  =  3.9728. 

The  Rp  values  are  plotted  in  Figure  12.1.  The  maximum  Rp  value  for  the 
possible  subsets  of  p  —  1  predictor  variables,  denoted  by  max  (7?^),  appears  at  the 
top  of  the  graph  for  each  p.  These  points  are  connected  by  dashed  lines  to  show 
the  impact  of  adding  additional  X  variables.  Figure  12.1  makes  it  clear  that  little 
increase  in  max(7?^)  takes  place  after  three  X  variables  are  included  in  the  model. 
Hence,  the  use  of  the  subset  (X1}  X2,  X3)  in  the  regression  model  appears  to  be 
reasonable  according  to  the  Rp  criterion. 

Note  that  variable  X4,  which  singly  correlates  most  highly  with  the  dependent 
variable,  is  not  in  the  max(7?^)  models  for  p  =  3  and  p  =  4,  indicating  that  X2 
andX3  contain  much  of  the  information  presented  by  X4.  If  it  were  desired  that  X4 
be  retained  in  the  model  and  that  the  subset  model  be  limited  to  three  X  variables, 
the  subset  (X2,  X3,  X4)  should  then  be  considered  as  next  best  according  to  the  Rp 
criterion  for  p  =  4.  The  coefficient  of  multiple  determination  associated  with  this 
subset,  R4  —  .883,  would  be  somewhat  smaller  than  R4  =  .972  for  the  subset 
(XltX2,X3). 


MSEp  or  Rl  criterion 

Since  Rp  does  not  take  account  of  the  number  of  parameters  in  the  model,  and 
since  max (Rp)  can  never  decrease  as  p  increases,  the  use  of  the  adjusted  coeffi- 


424  /  Selection  of  independent  variables 


FIGURE  12.1  Rp  plot  for  surgical  unit  example 


cient  of  multiple  determination  R «  in  (7.33): 


(12.5) 


R 


2  _ 


1  - 


n  —  1 


SSE 


n-p  SSTO 


MSE 

SSTO 


n 


1 


has  been  suggested  as  a  criterion  which  takes  the  number  of  parameters  in  the 
model  into  account  through  the  degrees  of  freedom.  It  can  be  seen  from  (12.5) 
that  Rl  increases  if  and  only  if  MSE  decreases  since  SSTO/(n  —  1)  is  fixed  for  the 
given  Y  observations.  Hence,  R%  and  MSE  are  equivalent  criteria.  We  shall  con¬ 
sider  here  the  criterion  MSEP.  M\n(MSEp)  can,  indeed,  increase  as  p  increases 
when  the  reduction  in  SSEP  becomes  so  small  that  it  is  not  sufficient  to  offset  the 
loss  of  an  additional  degree  of  freedom.  Users  of  the  MSEP  criterion  either  seek 
to  find  the  subset  of  X  variables  that  minimizes  MSEp,  or  one  or  several  subsets 
for  which  MSEP  is  so  close  to  the  minimum  that  adding  more  variables  is  not 
worthwhile. 


12.3  All  possible  regression  models  /  425 


Example.  The  MSEP  values  for  all  possible  regression  models  for  our  surgi¬ 
cal  unit  example  are  shown  in  Table  12.3,  column  5.  For  instance,  if  the  regres¬ 
sion  model  contains  only  X4,  we  have: 


MSE2 


SSE(X4 ) 
n  —  2 


1.8777 


52 


Figure  12.2  contains  the  MSEp  plot  for  our  example.  We  have  connected  the 
min (MSEP)  values  for  each  p  by  dashed  lines.  The  story  which  Figure  12.2  tells 
is  very  similar  to  that  told  by  Figure  12.1.  The  subset  (Xi,X2,  X3)  appears  to  be 
best.  Indeed,  the  mean  square  error  achieved  with  this  subset  is  practically  the 
same  as  that  with  (Xi,  X2,  X3,  X4),  which  uses  all  potential  X  variables. 

If  X4  were  to  be  included  in  the  model  with  p  =  4,  the  subset  (X2,  X3,  X4) 
would  be  best,  involving  MSE4  =  .009  which  is  somewhat  higher  than  MSE4  = 
.002  for  subset  (X1,X2,  X3). 


FIGURE  12.2  MSEP  plot  for  surgical  unit  example 


1  2  3  4  5 

P 


426  /  Selection  of  independent  variables 


Cp  criterion 

This  criterion  is  concerned  with  the  total  mean  squared  error  of  the  n  fitted 
values  for  each  of  the  various  subset  regression  models.  The  mean  squared  error 
concept  involves  a  bias  component  and  a  random  error  component.  The  mean 
squared  error  for  an  estimated  regression  coefficient  was  defined  in  (11.35). 

A 

Here,  the  mean  squared  error  pertains  to  the  fitted  values  Yt  for  the  regression 
model  employed.  The  bias  component  for  the  ith  fitted  value  Yt  is: 

(12.6)  E(Yd  -  E(Yi) 

where  E(Yi)  is  the  expectation  of  the  ith  fitted  value  for  the  given  regression 
model  and  E(Yj)  is  the  true  mean  response.  The  random  error  component  for  Y,  is 

A  A 

simply  a(Yj),  its  variance.  The  mean  squared  error  for  Yt  is  then  the  sum  of  the 
squared  bias  and  the  variance: 

(12.7)  [E(fd  ~  E(Yd]2  +  a2(fd 

The  total  mean  squared  error  for  all  n  fitted  values  Yt  is  the  sum  of  the  n 
individual  mean  squared  errors: 

n  n 

(12.8)  2  [£(?.■)  ~  E(Yd?  +  2  aHtd 

i=  1  i=  1 

The  criterion  measure,  denoted  by  Tp,  is  simply  the  total  mean  squared  error 
divided  by  a2,  the  true  error  variance: 

(12.9)  r„  =  -U  2  [£(?,.)  -  E(Yd  f  +  |>2(f,)  ■ 

V  l«=l  «=1 


The  model  which  includes  all  P  —  1  potential  X  variables  is  assumed  to  have 
been  carefully  chosen  so  thatMS'£,(Xl, . . .  ,XP_  ,)  is  an  unbiased  estimator  of  a2. 
It  can  then  be  shown  that  an  estimator  of  Tp  is  Cp: 


(12.10) 


,  SSEP 

MSE(X„  . . .  ,XP-t) 


-  (n-  2 p) 


where  SSEP  is  the  error  sum  of  squares  for  the  fitted  subset  regression  model  with 
p  parameters  (i.e.,  with  p  —  1  predictor  variables). 

When  there  is  no  bias  in  the  regression  model  with  p  —  1  predictor  variables 

A 

so  that  E(Yj)  =  E(Yi),  the  expected  value  of  Cp  is  approximately  p: 

(12.11)  E[CP  |  E(Y,)  =  £(}))]  =  p 

Thus,  when  the  Cp  values  for  all  possible  regression  models  are  plotted  against p, 
those  models  with  little  bias  will  tend  to  fall  near  the  line  Cp—p.  Models  with 
substantial  bias  will  tend  to  fall  considerably  above  this  line. 

In  using  the  Cp  criterion,  one  seeks  to  identify  subsets  of  X  variables  for  which 
(1)  the  Cp  value  is  small  and  (2)  the  Cp  value  is  near  p.  Sets  of  X  variables  with 
small  Cp  values  have  a  small  total  mean  squared  error,  and  when  the  Cp  value  is 


12.3  All  possible  regression  models  /  427 


also  near  p,  the  bias  of  the  regression  model  is  small.  It  may  sometimes  occur 
that  the  regression  model  based  on  the  subset  of  X  variables  with  the  smallest  Cp 
value  involves  substantial  bias.  In  that  case,  one  may  at  times  prefer  a  regression 
model  based  on  a  somewhat  larger  subset  of  X  variables  for  which  the  Cp  value  is 
slightly  larger  but  which  does  not  involve  a  substantial  bias  component.  Refer¬ 
ence  12.1  contains  extended  discussions  of  applications  of  the  Cp  criterion. 


Example.  Table  12.3,  column  6,  contains  the  Cp  values  for  all  possible 
regression  models  for  our  surgical  unit  example.  For  instance,  when  X4  is  the 
only  X  variable  in  the  regression  model,  the  Cp  value  is: 


SSE(X4) 

MSE(XuX2,X3,X4) 

1.8777 


[n  -  2(2)] 


.00224 


(54  -  4)  =  788.3 


The  Cp  values  for  all  possible  regression  models  are  plotted  in  Figure  12.3.  We 


FIGURE  12.3  Cp  plot  for  surgical  unit  example 


p 


428  /  Selection  of  independent  variables 


find  again  that  subset  (X1?  X2,  X3)  is  suggested.  This  subset  has  the  smallest  Cp 
value,  with  no  indication  of  any  bias  in  the  regression  model.  The  fact  that  the  Cp 
measure  for  this  model,  C4  =  3. 1,  is  below  p  =  4  is  the  result  of  random  varia¬ 
tion  in  the  Cp  measure. 

Note  that  use  of  all  potential  X  variables  (Xx,  X2,  X3,  X4)  would  involve  a 
larger  total  mean  squared  error.  Also,  use  of  subset  ( X2 ,  X3,  X4)  with  Cp  value 
C4  —  161.7  would  be  poor  because  of  the  substantial  bias  with  that  model. 

Comments 

1 .  Effective  use  of  the  Cp  criterion  requires  careful  development  of  the  pool  of  P  —  1 
potential  X  variables,  with  the  independent  variables  expressed  in  appropriate  form  (lin¬ 
ear,  quadratic,  transformed)  and  useless  variables  excluded  so  that  MSE(Xi, . . .  ,XP-{) 
provides  an  unbiased  estimate  of  the  error  variance  <x2. 

2.  The  Cp  criterion  places  major  emphasis  on  the  fit  of  the  subset  model  for  the  n 
sample  observations.  At  times,  a  modification  of  the  Cp  criterion  which  emphasizes  new 
observations  to  be  predicted  might  be  preferable. 

3.  To  see  why  Cp  as  defined  in  (12.10)  is  an  estimator  of  Tp,  we  need  to  utilize  two 
results  that  we  shall  simply  state.  First,  it  can  be  shown  that: 


(12.12)  JjOr\Yd=por2 

i=  1 

Thus,  the  total  random  error  of  the  n  fitted  values  Y,  increases  as  the  number  of  variables 
in  the  regression  model  increases. 

Further,  it  can  be  shown  that: 

(12.13)  E(SSEP)  -  2[£(F()  -  E{Yt)f  +  (n  ~  p)a 2 
Hence,  Tp  in  (12.9)  can  be  expressed  as  follows: 


(12.14) 


Tn  = 


1 


-[E(SSEP)  —  (n  —  p)cr 2  +  pa2] 


cr 

E(SSEp) 


(n  -  2 p) 


Replacing  E(SSEp)  by  the  estimator  SSEP  and  using MSE(Xi, . . .  ,XP-i)  as  an  estimator  of 
cr2  yields  Cp  in  (12.10). 


Identification  of  “best”  subsets  by  use  of  algorithm 

A  major  disadvantage  of  the  all-possible-regressions  selection  procedure  is 
the  amount  of  computation  required.  Since  each  potential  independent  variable 
either  can  be  included  or  excluded,  there  are  2(P-1)  possible  regression  models 
when  there  are  P  —  1  potential  X  variables.  When  P  —  1  =  10,  for  instance, 
there  are  1,024  possible  regression  models.  With  the  availability  of  large  com¬ 
puters  today,  running  all  possible  regression  models  for  as  many  as  10  potential 
X  variables  is  not  too  time  consuming.  Beyond  that,  however,  the  development 
of  all  possible  regression  models  becomes  inefficient. 

Even  when  all  possible  regression  models  can  easily  be  calculated,  as  when 


12.3  All  possible  regression  models  /  429 


there  are  eight  variables  in  the  pool  of  potential  X  variables,  it  may  be  difficult 
for  the  investigator  to  evaluate  carefully  all  of  the  regression  models  fitted.  In  the 
case  of  8  potential  X  variables,  for  instance,  there  would  be  256  models  to 
consider,  a  major  task  for  an  investigator. 

Thus,  whenever  the  pool  of  potential  X  variables  is  not  very  small,  it  is  highly 
desirable  that  the  investigator  be  able  to  concentrate  on  a  limited  number  of 
regression  models  which  are  the  “best”  ones  according  to  a  specified  criterion. 
This  limited  number  might  consist  of  the  “best”  5  or  10  subsets  according  to  the 
criterion  employed,  so  that  the  investigator  can  then  carefully  study  these  regres¬ 
sion  models  for  choosing  the  final  model  to  be  employed. 

Time-saving  algorithms  have  been  developed  in  which  the  “best”  subsets 
according  to  a  specified  criterion  are  identified  without  requiring  the  fitting  of 
most  of  the  possible  subset  regression  models.  Thus,  if,  say,  the  Cp  criterion  is 
to  be  employed  and  the  five  “best”  subsets  according  to  this  criterion  are  to  be 
identified,  these  algorithms  search  for  the  five  subsets  of  X  variables  with  the 
smallest  Cp  values  using  much  less  computational  effort  than  when  all  possible 
subsets  are  evaluated.  These  algorithms  are  called  “best”  subsets  algorithms. 
Not  only  do  these  algorithms  provide  the  best  subsets  according  to  the  specified 
criterion,  but  they  often  will  also  provide  a  number  of  “good”  subsets  for  each 
possible  number  of  X  variables  in  the  model  to  give  the  investigator  helpful 
information  in  making  the  final  selection  of  the  subset  of  X  variables  to  be 
employed  in  the  regression  model. 

Example.  In  our  surgical  unit  example,  use  of  one  of  the  “best”  subsets 
algorithms  will  provide  a  portion  of  the  information  in  Table  12.3.  Suppose  that 
the  Cp  criterion  is  to  be  employed  and  that  the  “best”  three  subsets  are  to  be 
identified.  The  algorithm  will  then  identify  subsets  (X1,X2,X3),  (X}  ,  X2,  X3, 
X4),  and  (X2,  X3,  X4)  as  the  three  subsets  with  the  smallest  Cp  values.  In  addition, 
information  about  three  “good”  subsets  for  each  level  of  p  may  be  provided. 


Some  final  comments 

The  all-possible-regressions  selection  approach  or  a  “best”  subsets  algorithm 
leads  to  the  identification  of  a  small  number  of  subsets  which  are  “good”  ac¬ 
cording  to  a  specified  criterion.  While  in  our  surgical  unit  example,  each  of  the 
three  criteria  pointed  to  the  same  “best”  subset,  this  is  not  always  the  case.  It 
therefore  may  be  desirable  at  times  to  consider  more  than  one  criterion  in  evalu¬ 
ating  possible  subsets  of  X  variables. 

Once  the  investigator  has  identified  a  few  subsets  as  “good”  ones,  a  final 
choice  of  the  model  variables  must  be  made.  This  choice  is  aided  by  residual 
analyses  and  examinations  of  influential  observations  for  each  of  the  competing 
models.  Information  gained  by  these  analyses,  together  with  knowledge  by  the 
invesigator  about  the  phenomenon  under  study,  will  be  helpful  in  choosing  the 
final  regression  model  to  be  employed. 


430  /  Selection  of  independent  variables 


12.4  STEPWISE  REGRESSION 

In  those  occasional  cases  when  the  pool  of  potential  X  variables  contains  40  to 
60  or  even  more  variables,  use  of  a  “best”  subsets  algorithm  may  not  be  feasi¬ 
ble.  An  automatic  search  procedure  that  develops  sequentially  the  subset  of  X 
variables  to  be  included  in  the  regression  model  may  be  helpful  in  those  cases. 
The  stepwise  regression  procedure  is  probably  the  most  widely  used  of  the  auto¬ 
matic  search  methods.  It  was  developed  to  economize  on  computational  efforts, 
as  compared  with  the  all-possible-regressions  approach,  while  arriving  at  a  rea¬ 
sonably  “good”  subset  of  independent  variables.  Essentially,  this  search  method 
develops  a  sequence  of  regression  models,  at  each  step  adding  or  deleting  an  A 
variable.  The  criterion  for  adding  or  deleting  an  X  variable  can  be  stated  equiva¬ 
lently  in  terms  of  error  sum  of  squares  reduction,  coefficient  of  partial  correla¬ 
tion,  or  F*  statistic. 


Search  algorithm 

We  shall  describe  the  stepwise  regression  search  algorithm  in  terms  of  the  F* 
statistic  for  the  partial  F  test. 

1.  The  stepwise  regression  routine  first  fits  a  simple  regression  model  for 
each  of  the  P  —  1  potential  X  variables.  For  each  simple  regression  model,  the 
F*  statistic  (3.59)  for  testing  whether  or  not  the  slope  is  zero  is  obtained: 


(12.15) 


MSR(Xk) 
MSE(Xk ) 


Recall  that  MSR(Xk )  =  SSR(Xk)  measures  the  reduction  in  the  total  variation  of  Y 
associated  with  the  use  of  the  variable  Xk.  The  X  variable  with  the  largest  F* 
value  is  the  candidate  for  first  addition.  If  this  F*  value  exceeds  a  predetermined 
level,  the  X  variable  is  added.  Otherwise,  the  program  terminates  with  no  X 
variable  considered  sufficiently  helpful  to  enter  the  regression  model. 

2.  Assume  X7  is  the  variable  entered  at  step  1 .  The  stepwise  regression  rou¬ 
tine  now  fits  all  regression  models  with  two  X  variables,  where  X7  is  one  of  the 
pair.  For  each  such  regression  model,  the  partial  F  test  statistic  (8.24): 


(12.16) 


MSR(Xk  X7)  _  bk 
MSE(X7,  Xk)  Tfo) 


is  obtained.  This  is  the  statistic  fortesting  whether  or  not  (3k  =  0  when X7  and Xk 
are  the  variables  in  the  model.  The  X  variable  with  the  largest  F*  value  is  the 
candidate  for  addition  at  the  second  stage.  If  this  F*  value  exceeds  a  predeter¬ 
mined  level,  the  second  X  variable  is  added.  Otherwise  the  program  terminates. 

3.  Suppose  X3  is  added  at  the  second  stage.  Now  the  stepwise  regression 
routine  examines  whether  any  of  the  other  X  variables  already  in  the  model 
should  be  dropped.  For  our  illustration,  there  is  at  this  stage  only  one  other  X 
variable  in  the  model,  X7,  so  that  only  one  partial  F  test  statistic  is  obtained: 


12.4  Stepwise  regression  /  431 


*  MSR(X7  A3) 

(12.17)  Ff  = - — 1  —  - 

MSE(X3,X7 ) 

At  later  stages,  there  would  be  a  number  of  these  F*  statistics,  for  each  of  the 
variables  in  the  model  besides  the  one  last  added.  The  variable  for  which  this  F* 
value  is  smallest  is  the  candidate  for  deletion.  If  this  F*  value  falls  below  a 
predetermined  limit,  the  variable  is  dropped  from  the  model;  otherwise,  it  is 
retained. 

4.  Suppose  X7  is  retained  so  that  both  A3  and  X7  are  now  in  the  model.  The 
stepwise  regression  routine  now  examines  which  X  variable  is  the  next  candidate 
for  addition,  then  examines  whether  any  of  the  variables  already  in  the  model 
should  now  be  dropped,  and  so  on  until  no  further  X  variables  can  either  be  added 
or  deleted,  at  which  point  the  search  terminates. 

It  should  be  noted  that  the  stepwise  regression  algorithm  allows  an  A  variable, 
brought  into  the  model  at  an  earlier  stage,  to  be  dropped  subsequently  if  it  is  no 
longer  helpful  in  conjunction  with  variables  added  at  later  stages. 

Example 

Figure  12.4  shows  the  computer  printout  obtained  when  a  particular  stepwise 
regression  routine  (BMDP2R,  Ref.  12.2)  was  applied  to  our  surgical  unit  exam¬ 
ple.  The  minimum  acceptable  F  limit  for  adding  a  variable  and  the  maximum 
acceptable  F  limit  for  removing  a  variable  were  specified  to  be  4.0  and  3.9, 
respectively,  as  shown  at  the  top  left  of  Figure  12.4.  Since  the  degrees  of  free¬ 
dom  associated  with  MSE  vary,  depending  on  the  number  of  X  variables  in  the 
model,  and  since  repeated  tests  on  the  same  data  are  undertaken,  fixed  F  limits 
for  adding  or  deleting  a  variable  have  no  precise  probabilistic  meaning.  Note, 
however,  that  F(.95;  1,  60)  =  4.00,  so  that  the  specified  F  limits  of  4.0  and  3.9 
would  correspond  roughly  to  a  level  of  significance  of  .05  for  any  single  test 
based  on  approximately  50  degrees  of  freedom. 

The  minimum  acceptable  tolerance  of  .01  shown  in  the  upper  left  of  Fig¬ 
ure  12.4  is  a  specification  to  guard  against  the  entry  of  a  variable  that  is  highly 
correlated  with  the  other  X  variables  already  in  the  model.  As  explained  in 
Section  11.3,  the  tolerance  is  defined  as  1  —  Rk,  where  R2k  is  the  coefficient  of 
multiple  determination  when  Xk  is  regressed  on  the  other  X  variables  in  the  re¬ 
gression  model.  The  tolerance  specification  of  .01  in  Figure  12.4  provides  that 
no  variable  is  to  be  added  to  the  model  which  has  a  coefficient  of  multiple 
determination  with  the  other  X  variables  already  in  the  model  which  exceeds 
1  —  .01  =  .99  or  which  would  cause  the  R%  for  any  variable  in  the  model  to 
exceed  .99. 

We  shall  now  follow  through  the  steps. 

1.  At  step  0,  no  A  variable  is  in  the  model  so  that  the  model  to  be  fitted  is 
Yj  —  P o  +  £,-.  The  residual  or  error  sum  of  squares  shown  in  the  ANOYA  table  in 
Figure  12.4  for  step  0  is  therefore  E(T;-  —  Y)2  =  SSTO  =  3.9728.  For  each  po¬ 
tential  A  variable,  the  F*  statistic  (12. 15)  is  calculated.  In  Figure  12.4,  these  F% 


432  /  Selection  of  independent  variables 


FIGURE  12.4  Stepwise  regression  for  surgical  unit  example  (BMDP2R,  Ref.  12.2) 


REGRESSION  TITLE . 

STEPPING  ALGORITHM . 

MAXIMUM  NUMBER  OF  STEPS  . 

DEPENDENT  VARIABLE . 

MINIMUM  ACCEPTABLE  F  TO  ENTER  . 

MAXIMUM  ACCEPTABLE  F  TO  REMOVE . 

MINIMUM  ACCEPTABLE  TOLERANCE . 

SUBSCRIPTS  OF  THE  INDEPENDENT  VARIABLES  .  .  .  . 


SURGICAL  UN  11  DA  I  % 

HI 
5  Y 

t'.  000,  4.000 

3 . 900,  3.900 

0.01000 

12  3  4 


STEP  NO,  0 
STD.  ERROR  OF  EST. 
ANALYSIS  OF  VARIANCE 
RESIDUAL 

SSTO ^ 


SUM  OF  SQUARES 
r3. 9727688 


MEAN  SQUARE 
0.7495785E-01-* 


-SSTO/(n- 1) 


VARIABLE 

(Y-INTERCEPT 


VARIABLES  IN  EQUATION 

STD.  ERROR  STD  REG 
COEFFICIENT  OF  COEFF  COEFF 


2.2U6  ) 

t 

Y 


F  TO 
REMOVE 


VARIABLES  NOT  IN  EQUATION 


SSR(X4) 


STEP  NO.  1 
VARIABLE  ENTERED  4 

MULTIPLE  R 
MULTIPLE  R-SQUARE 
ADJUSTED  R-SQUARE 
STD.  ERROR  OF  EST. 

ANALYSIS  OF  VARIANCE 


VARIABLE 

PARTIAL 

CORR. 

TOLERANCE 

F  TO 
ENTER 

XT 

1 

0.34640 

1 .00000 

7.09 

X2 

2 

0.59289 

1 .00000 

28.  19 

X3 

3 

0.665T3 

T  .00000 

41.26 

X4 

4 

0.72621 

| 

1.00000 

58.02 

A 

T 

rYk 

T 

n 

\Jmse 


MS/?(X4) 


- ^  SUM  OF  SQUARES 

REGRESSION  2.0951490 
RESIDUAL  _^1  . 8776T  88 


DF 

1 

52 


MEAN  SQ 
2.09514 9 
0.3610805E-01 


VARIABLES  IN  EQUATION 

-q/ 

VARIABLES 

NOT  IN  EQUAT 

ION 

STD.  ERROR 

STD  REG 

F  TO 

PARTIAL 

F  TO 

VARIABLE  COEFFICIENT 

OF  COEFF 

COEFF 

TOLERANCE 

REMOVE 

level! 

VARIABLE 

CORR. 

TOLERANCE 

ENTER 

level 

(Y-INTERCEPT 

1.696  ) 

X4  4 

0.  186 

0.024 

0.726 

1 .00000 

58. D2 

! 

XI 

1  -0.03105 

0.74758 

0.05 

A 

i 

i 

A 

X2 

2  0.5U849 

0.86382 

17.79 

t 

f 

t 

t 

X3 

3  0.58031 

0.82659 

25.90 

STEP  NO.  2 

A 

s(bk) 

Bk 

F'k 

t 

rYkA 

ta 

1-r2 

1  r4  k 

VARIABLE  ENTERED  3 

X3 

MULTIPLE  R 

0.8286 

MULTI PLE  R-SQUARE 

0.6865 

ADJUSTED  R-SQUARE 

0.6742 

STD.  ERROR  OF  EST. 

0.1563 

ANALYSIS  OF  VARIANCE 

SUM  OF 

SQUARES 

DF  MEAN 

SQUARE 

F  RATIO 

REGRESSION 

2.7274637 

2  1.363731 

55.85 

RESIDUAL 

1,2453051 

51  0. 2441774E-01 

VARIABLES  IN  EQUATION 

VARIABLES 

NOT  IN  EQUATION 

STD.  ERROR 

STD  REG 

F  TO 

PARTIAL 

F  TO 

VARIABLE  COEFFICIENT 

OF  COEFF 

COEFF 

TOLERANCE 

REMOVE 

level! 

variable 

CORR. 

TOLERANCE 

ENTER 

LEVEL 

(Y-INTERCEPT 

T.389  ) 

X3  3 

0.006 

0.001 

0.439 

0.82659 

25.90 

! 

XI 

1  0.32276 

0.59179 

5.81 

X4  4 

0.139 

0.022 

0.543 

0.82659 

39.72 

. 

X2 

2  0.79149 

0.82580 

83.85 

1  -Ri 


values  are  shown  under  the  heading  “Variables  not  in  equation”  and  are  called 
“F  to  enter”  values.  We  see  that  F*  is  the  largest  one: 


*  MSR(X4 )  2.09515  o  „ 

Ft  = - — -  = - =  58.02 

MSE(X4 )  .03611 

Since  this  value  exceeds  the  minimum  acceptable  F-to-enter  value  4.0,  X4  is 
added  to  the  model. 

The  column  headed  “Level”  refers  to  a  package  option  which  permits  the 
user  to  give  different  priorities  to  the  various  potential  X  variables.  Note  that  in 
the  present  example  all  X  variables  have  the  same  priority. 

2.  At  this  stage,  step  1  has  been  completed.  The  current  regression  model 
contains  X4,  and  the  estimated  regression  coefficients,  the  analysis  of  variance 
table,  and  selected  other  information  about  the  current  model  are  provided. 


12.4  Stepwise  regression 


/ 


433 


FIGURE  12.4  (concluded) 


STEP  NO.  3 
VARIABLE  ENTERED  2  X2 


MULTI PLE  R 
MULTIPLE  R-SQUARE 
ADJUSTED  R-SQUARE 
STD.  ERROR  OF  EST 


0.9396 

0.8829 

0.8759 

0.0965 


ANALYSIS  OF  VARIANCE 


SUM  OF  SQUARES 

OF  MEAN 

SQUARE 

F  RATIO 

REGRESSION 

3.5075836 

3  1.169194 

125.67 

RESIDUAL 

0.46518236 

50  0.9303644E-02 

VARIABLES  IN  EQUATION 

VARIABLES 

NOT  IN  EQUATION 

STD.  ERROR 

STD  REG 

F  TO 

PARTIAL 

F  TO 

VARIABLE  COEFFICIENT  OF  COEFF 

COEFF 

TOLERANCE 

REMOVE 

level! 

VARIABLE  CORR. 

TOLERANCE 

ENTER 

( Y-INTERCEPT 

0.942  ) 

X2  2 

0.008  0.001 

0.488 

0.82580 

83.85 

1  !  XI 

0.87411 

0.55586 

158.69 

X3  3 

0.007  0.001 

0.543 

D. 79021 

99.63 

1  . 

X4  4 

0.082  0.015 

0.320 

0.68298 

29.86 

1 

STEP  NO.  4 

VARIABLE  ENTERED 

1  XI 

MULTI PLE  R 

0.9861 

MULTIPLE  R-SQUARE 

0. 9724 

ADJUSTED  R-SQUARE 

0,9701 

STD.  ERROR  OF  EST. 

0.0473 

ANALYSIS  OF  VARIANCE 

SUM  OF  SQUARES 

DF  MEAN 

SQUARE 

F  RATIO 

REGRESSION 

3.8630199 

4  0.9657550 

431.19 

RESIDUAL 

0.  10974741 

49  0.2239743E-02 

VARIABLES  IN  EQUATION 

VARIABLES 

NOT  IN  EQUAT 

ION 

STD.  ERROR 

STD  REG 

F  TO 

PARTIAL 

F  TO 

VARIABLE  COEFFICIENT  OF  COEFF 

COEFF 

TOLERANCE 

REMOVE 

level! 

VARIABLE  CORR. 

TOLERANCE 

ENTER 

(Y-INTERCEPT 

0.489  ) 

XI  1 

0.069  0.005 

0.401 

0.55586 

158.69 

i  ! 

X2  2 

0.009  0.000 

0.571 

0.77567 

449.08 

i 

X3  3 

0.009  0.000 

0.736 

0.59594 

571.83 

i 

X4  4 

0.002  0.010 

0.008 

0.39134 

0.04 

i  . 

STEP  NO.  5 

VARIABLE  REMOVED 

4  X4 

MULTIPLE  R 

0.9861 

MULTIPLE  R-SQUARE 

0.9724 

ADJUSTED  R-SQUARE 

0.9707 

STD,  ERROR  OF  EST. 

0.0469 

ANALYSIS  OF  VARIANCE 

SUM  OF  SQUARES 

OF  MEAN 

SQUARE 

F  RATIO 

REGRESSION 

3.8629322 

3  1.287643 

586.17 

RESIDUAL 

0. 10983557 

50  0.21 967 1 1 E-02 

VARIABLES  IN  EQUATION 

VARIABLES 

NOT  IN  EQUATION 

STD.  ERROR 

STD  REG 

F  TO 

PARTIAL 

F  TO 

VARIABLE  COEFFICIENT  OF  COEFF 

COEFF 

TOLERANCE 

REMOVE 

level! 

VARIABLE  CORR. 

TOLERANCE 

ENTER 

(Y-INTERCEPT 

0.484  ) 

XI  1 

0.069  0.004 

0.405 

0.9701 1 

288.23 

1  !  X4 

4  0.02833 

0.39134 

0.04 

X2  2 

0.009  0.000 

0.574 

0.99178 

590.58 

i  . 

X3  3 

0.010  0.000 

0.739 

0.97751 

966.28 

i  . 

*  *  *  *  *  F-LEVELS( 

4.000,  3.900)  OR 

TOLERANCE 

INSUFFICIENT 

FOR  FURTHER  STEPPING 

LEVEL 


1 


LEVEL 


LEVEL 


1 


Next,  all  regression  models  containing  X4  and  another  independent  variable 
are  fitted,  and  the  F*  statistics  calculated.  They  are  now: 

r*  MSR(Xk  1  X4) 

k  MSE(X4,  Xk) 

These  statistics  are  shown  in  step  1  under  the  heading  “Variables  not  in  equa¬ 
tion.”  X3  has  the  highest  F*  value,  which  exceeds  4.0,  so  that  X3  now  enters  the 
model. 

3.  Step  2  in  Figure  12.4  summarizes  the  situation  at  this  point.  X3  mdX4  are 
now  in  the  model,  and  information  about  this  model  is  provided.  Next,  a  test 
whether  X4  should  be  dropped  is  undertaken.  The  F*  statistic  is  shown  under  the 
heading  “Variables  in  equation”  and  is  called  “F  to  remove”: 

*  MSR(X4  I X3) 

Ft  = -  1  =  39.72 

MSE(X3,X4) 


434  /  Selection  of  independent  variables 


Since  this  F*  value  exceeds  the  maximum  acceptable  F-to-remove  value  3.9,  X4 
is  not  dropped. 

4.  Next,  all  regression  models  containing  X3,  X4,  and  one  of  the  remaining 
potential  X  variables  are  fitted.  The  appropriate  F*  statistics  now  are: 

,  MSR(Xt  |X3,  X4) 

*  MSE(X3,  x4,  Xt) 

These  statistics  are  shown  in  step  2  under  the  heading  ‘  ‘Variables  not  in  equa¬ 

tion.”  X2  has  the  highest  F*  value,  which  exceeds  4.0,  so  thatX2  now  enters  the 
model. 

5.  Step  3  in  Figure  12.4  summarizes  the  situation  at  this  point.  X2,  X3,  and  V4 
are  now  in  the  model.  Next,  a  test  is  undertaken  whether  X3  or  X4  should  be 
dropped.  The  F*  statistics  to  remove  a  variable  are  shown  under  the  heading 
“Variables  in  equation”  in  step  3.  F*  is  smallest: 

MSR(X4  I  X2,  X3) 

F*  = - v  41  2’  =  29.86 

MSE(X2,  X3,  X4) 

Since  its  value  exceeds  3.9,  X4  is  not  dropped  from  the  model. 

6.  At  this  point,  only  Xi  remains  in  the  potential  pool.  Its  F*  value  to  enter 
exceeds  4.0  (see  “Variables  not  in  equation”  under  step  3),  so  Vi  is  entered  into 
the  model. 

7.  Step  4  in  Figure  12.4  summarizes  the  addition  of  variable  Xt  into  the 
model  containing  variables  X2,X3,  and  X4.  Next,  a  test  is  undertaken  to  deter¬ 
mine  whether  either  X2,  X3,  or  X4  should  be  dropped.  The  F*  statistics  are  shown 
under  the  heading  “Variables  in  equation”  in  step  4.  Note  that: 

*  MSR(X 4\XUX2,X3) 

Ft  = -  '  --  =  .04 

MSE(XUX2,X  3,X4) 

is  smallest,  and  that  its  value  is  less  than  3.9;  hence,  X4  is  deleted. 

8.  Step  5  summarizes  the  dropping  of  X4  from  the  model.  Since  the  only 
potential  variable  remaining  is  X4,  which  has  just  been  dropped,  it  cannot  enter 
the  regression  now.  The  algorithm  therefore  next  considers  the  “F  to  remove” 
values  in  step  5  which  indicate  that  F*  is  smallest.  However,  since  its  value 
exceeds  3.9,  X\  is  not  dropped  from  the  model  and  the  search  process  is  termi¬ 
nated. 

Thus,  the  stepwise  search  algorithm  identifies  (Xi,X2,  X3)  as  the  “best”  sub¬ 
set  of  X  variables,  a  result  that  happens  to  be  consistent  with  our  previous  analy¬ 
ses  based  on  the  all-possible-regressions  approach. 


Comments 

1.  In  the  surgical  unit  example,  the  tolerance  requirement  was  always  met;  hence,  no 
variable  was  excluded  from  the  model  as  a  result  of  too  high  a  correlation  with  the  other  X 
variables  in  the  model. 


12.4  Stepwise  regression  /  435 


2.  Variations  of  the  rules  for  entering  and  removing  variables  illustrated  in  the  exam¬ 
ple  are  possible.  For  instance,  different  F-to-enter  and  F-to-remove  values  can  be  em¬ 
ployed  in  accordance  with  the  degrees  of  freedom  associated  with  MSE  in  the  F*  statistic. 
However,  this  refinement  often  is  not  utilized,  and  fixed  values  are  employed  instead 
since  the  repeated  testing  in  the  search  procedure  does  not  permit  precise  probabilistic 
interpretations . 

3.  The  minimum  acceptable  F-to-enter  value  should  never  be  smaller  than  the  maxi¬ 
mum  acceptable  F-to-remove  value;  otherwise  cycling  is  possible  where  a  variable  is 
continually  entered  and  removed. 

4.  The  order  in  which  variables  enter  the  regression  model  does  not  reflect  their 
importance.  In  our  surgical  unit  example,  for  instance,  XA  was  the  first  variable  to  enter 
the  model  yet  it  was  eventually  dropped. 

5.  The  stepwise  regression  routine  we  employed  prints  out  the  partial  correlation 
coefficients  at  each  stage.  These  could  be  used  equivalently  to  the  F*  values  for  screening 
the  X  variables,  and  indeed  some  routines  actually  use  the  partial  correlation  coefficients 
for  screening. 

6.  The  F  limits  for  adding  and  deleting  a  variable  need  not  be  selected  in  terms  of 
approximate  significance  levels,  but  may  be  determined  descriptively  in  terms  of  error 
reduction.  For  instance,  an  F  limit  of  2.0  for  adding  a  variable  may  be  specified  with  the 
thought  that  the  marginal  error  reduction  associated  with  the  added  variable  should  be  at 
least  twice  as  great  as  the  remaining  error  mean  square  once  that  variable  has  been  added. 

7.  A  limitation  of  the  stepwise  regression  search  approach  is  that  it  presumes  there  is 
a  single  “best”  subset  of  X  variables  and  seeks  to  identify  it.  As  noted  earlier,  there  is 
often  no  unique  “best”  subset.  Hence,  some  statisticians  suggest  that  all  possible  regres¬ 
sion  models  with  a  similar  number  of  X  variables  as  in  the  stepwise  regression  solution  be 
fitted  subsequently  to  study  whether  some  other  subsets  of  X  variables  might  be  better. 

Another  limitation  of  the  stepwise  regression  routine  is  that  it  sometimes  arrives  at  an 
unreasonable  “best”  subset  when  the  X  variables  are  very  highly  correlated. 


Other  automatic  search  procedures 

There  are  a  number  of  other  automatic  search  procedures  which  have  been 
proposed  to  find  a  “best”  subset  of  independent  variables.  We  mention  two  of 
these.  Neither  of  the  two  methods,  however,  has  gained  the  acceptance  of  the 
stepwise  search  procedure. 

Forward  selection.  This  search  procedure  is  a  simplified  version  of  step¬ 
wise  regression,  omitting  the  test  whether  a  variable  once  entered  into  |he  model 
should  be  dropped. 


Backward  elimination.  This  search  procedure  is  the  opposite  of  forward 
selection.  It  begins  with  the  model  containing  all  potential  X  variables  and  identi¬ 
fies  the  one  with  the  smallest  F*  value.  For  instance,  the  F*  value  for  Xi  is: 


(12.18) 


MSR(Xj  \X2,...,XP-1) 
MSEQLu  . . .  ,XP-i) 


If  the  minimum  F*  value  is  less  than  a  predetermined  limit,  that  independent 
variable  is  dropped.  The  model  with  the  remaining  P  —  2  predictor  variables  is 


436  /  Selection  of  independent  variables 


then  fitted,  and  the  next  candidate  for  dropping  is  identified.  This  process  contin¬ 
ues  until  no  further  independent  variables  can  be  dropped. 

The  backward  elimination  procedure  requires  more  computations  than  the 
forward  selection  method  since  it  starts  with  the  biggest  possible  model.  How¬ 
ever,  it  does  have  the  advantage  of  showing  users  the  implications  of  models 
with  many  variables. 

12.5  SELECTION  OF  VARIABLES  WITH  RIDGE  REGRESSION 

In  Section  11.4,  we  discussed  the  use  of  ridge  regression  for  helping  to  over¬ 
come  problems  related  to  multicollinearities  among  the  X  variables.  The  ridge 
trace  mentioned  there  (Figure  11.2,  p.  399)  can  also  be  used  to  identify  variables 
which  might  be  dropped  from  the  regression  model.  It  has  been  suggested  that 
variables  be  dropped  whose  ridge  trace  is  unstable,  with  the  coefficient  tending 
toward  the  value  of  zero.  Also,  variables  should  be  dropped  whose  ridge  trace  is 
stable  but  at  a  very  small  value.  Finally,  variables  with  unstable  ridge  traces  that 
do  not  tend  toward  zero  should  be  considered  as  candidates  for  dropping. 

12.6  IMPLEMENTATION  OF  SELECTION  PROCEDURES 

Options  and  refinements 

Our  discussion  of  the  major  selection  procedures  for  identifying  “good”  sets 
of  X  variables  has  focused  on  the  main  conceptual  issues  and  not  on  options, 
variations,  and  refinements  available  with  particular  computer  packages.  It  is 
essential  that  the  specific  features  of  the  package  to  be  employed  are  fully  under¬ 
stood  so  that  intelligent  use  of  the  package  can  be  made.  In  some  packages,  there 
is  an  option  for  regression  models  through  the  origin.  Some  packages  permit 
variables  to  be  brought  into  the  model  and  tested  in  pairs  or  other  groupings 
instead  of  singly,  to  save  computing  time  or  for  other  reasons.  Some  packages, 
once  a  “best”  regression  model  is  identified,  will  fit  all  the  possible  regression 
models  in  the  same  number  of  variables  and  will  develop  information  for  each 
model  so  that  a  final  choice  can  be  made  by  the  user.  Some  stepwise  programs 
have  options  for  forcing  variables  into  the  regression  model;  such  variables  are 
not  removed  even  if  their  F*  values  become  too  low. 

The  diversity  of  these  options  and  special  features  serves  to  emphasize  a  point 
made  earlier:  there  is  no  unique  way  of  searching  for  “good”  subsets  of  X 
variables,  and  subjective  elements  must  play  an  important  role  in  the  search 
process. 

Completion  of  model  building  process 

The  screening  of  variables  by  a  computerized  selection  process  is  only  one 
step  in  the  building  of  a  regression  model.  Once  the  set  of  X  variables  has  been 
identified,  the  resulting  model  needs  to  be  studied  for  its  aptness  by  the  methods 
of  Chapters  4  and  11.  If  repeat  observations  are  available,  a  formal  test  for  lack 


Problems  /  437 


of  fit  can  be  made.  In  any  case,  a  variety  of  residual  plots  and  analyses  can  be 
employed  to  identify  the  nature  of  lack  of  fit,  outliers,  and  influential  observa¬ 
tions.  When  the  original  set  of  P  —  1  potential  X  variables  excludes  cross- 
product  terms  and  powers  of  the  independent  variables  to  keep  the  selection 
problem  within  reasonable  bounds,  residual  plots  against  such  “missing”  varia¬ 
bles,  or  augmenting  the  model  of  “best”  independent  variables  by  adding  cross- 
product  and/or  power  terms,  can  be  useful  in  identifying  ways  in  which  the 
model  fit  can  be  improved  further. 


Cautions  in  use  of  final  model 

The  model-building  process,  as  we  have  just  noted,  requires  repeated  analyses 
on  the  same  set  of  data  in  order  to  arrive  at  a  model  which  fits  the  data  well.  A 
consequence  is  that  the  model  may  be  subject  to  prediction  bias,  i.e.,  the  indi¬ 
cated  predictive  ability  of  the  model  for  the  data  on  which  the  model  is  based 
may  be  greater  than  the  model’s  predictive  ability  for  new  data.  The  prediction 
bias  arises  because  the  choice  of  the  final  model  is  so  uniquely  related  to  the 
observations  at  hand.  The  prediction  bias  may  be  particularly  large  when  the 
effects  of  independent  variables  are  small. 

It  is  good  statistical  practice  to  measure  the  prediction  bias  by  observing  the 
predictive  power  of  the  model  on  a  new  set  of  data.  If  necessary,  some  of  the 
original  data  can  be  kept  aside  for  this  calibration  of  predictive  power  and  the 
model  derived  only  from  the  remaining  data. 

Often,  a  predictive  model  is  desired  for  values  of  the  independent  variables 
which  cover  only  a  portion  of  the  entire  observation  space.  In  that  case,  it  is  good 
practice  to  test  the  stability  of  the  regression  model  by  fitting  it  to  that  portion  of 
the  observations  which  fall  in  the  space  of  future  interest  and  comparing  the 
regression  results  with  those  for  the  model  based  on  all  observations.  Similarly, 
if  the  data  are  time  series,  it  is  often  desirable  to  study  the  stability  of  the  regres¬ 
sion  model  over  time  by  fitting  the  model  also  to  the  most  recent  data  alone,  and 
comparing  results. 

In  this  connection,  it  is  worthwhile  repeating  an  earlier  caution.  When  the 
independent  variables  are  highly  intercorrelated,  use  of  the  model  for  prediction 
for  values  of  the  independent  variables  that  do  not  follow  the  past  pattern  of 
multicollinearity  becomes  highly  suspect. 


PROBLEMS 

12.1.  A  speaker  stated:  “In  a  well-designed  experiment  involving  quantitative  inde¬ 
pendent  variables,  a  procedure  for  screening  the  independent  variables  after  the 
observations  are  obtained  is  not  necessary.”  Discuss. 

12.2.  An  educational  researcher  wishes  to  predict  the  grade  point  average  in  graduate 
work  for  applicants  to  the  Graduate  School.  List  a  dozen  variables  that  might  be 
useful  independent  variables  here. 


438  /  Selection  of  independent  variables 


12.3.  Agency  revenues.  An  economic  consultant  was  retained  by  a  large  employ¬ 
ment  agency  in  a  metropolitan  area  to  develop  a  regression  model  for  predicting 
monthly  agency  revenues  (F).  She  decided  that  three  economic  indicators  for  the 
area  were  potentially  useful  as  independent  variables,  namely,  average  weekly 
overtime  hours  of  production  workers  in  manufacturing  (X)),  number  of  job 
vacancies  in  manufacturing  (X2),  and  index  of  help  wanted  advertising  in  news¬ 
papers  (Z3) .  Monthly  observations  on  agency  revenues  and  the  three  independent 
variables  (all  seasonally  adjusted)  were  obtained  for  the  past  25  months.  The 
ANOVA  table  for  the  model  Y,  =  ft0  +  ft[Xn  +  ft 2Xi2  +  ft 3X3  +  is  as  fol¬ 
lows: 


Source  of 
Variation 

SS 

df 

MS 

Regression 

5,409.89 

3 

1,803.30 

Error 

16.35 

21 

.78 

Total 

5,426.24 

24 

a.  Test  to  determine  whether  a  regression  relation  exists.  Use  a  level  of  signifi¬ 
cance  of  .05.  State  the  alternatives,  decision  rule,  and  conclusion. 

b.  If  a  regression  relation  had  not  existed,  what  would  this  imply  about  screen¬ 
ing  the  independent  variables? 

12.4.  Refer  to  Agency  revenues  Problem  12.3.  The  consultant  decided  to  screen  the 
independent  variables  to  determine  the  best  set  for  predicting  agency  revenues. 
The  regression  sums  of  squares  for  all  possible  regression  models  were  found  to 
be  as  follows: 


Independent 

Independent 

Variables 

Variables 

in  Model 

SSR 

in  Model 

SSR 

X, 

2,970.64 

xux2 

5,123.80 

x2 

3,654.85 

Xltx3 

5,409.59 

x3 

3,584.54 

x2,  X3 

3,741.30 

XX.X2.X3 

5,409.89 

a.  Indicate  which  subset  of  independent  variables  is  best  for  predicting  Y  ac¬ 
cording  to  each  of  the  following  criteria:  (1)  Rp,  (2)  MSEp,  (3)  Cp.  Support 
your  recommendations  with  appropriate  graphs. 

b.  Did  the  three  criteria  in  part  (a)  identify  the  same  best  subset?  Does  this 
always  happen? 

c.  Would  stepwise  regression  have  any  advantages  here  as  a  screening  proce¬ 
dure  over  all  possible  regressions? 

d.  An  observer  states:  “There  are  only  three  variables,  so  why  screen?  You 
might  as  well  use  all  three.”  Discuss. 

12.5.  Refer  to  Patient  satisfaction  Problem  7.17.  The  ANOVA  table  for  the  model 

Yi  =  ft0  +  ftiXn  +  ft2Xi2  +  ft3Xl3  +  Si  is  as  follows: 


Source  of 

Variation 

SS 

df 

MS 

Regression 

4,133.62 

3 

1,377.87 

Error 

2,011.59 

19 

105.87 

Total 


6,145.21  22 


Problems  /  439 


The  hospital  administrator  decided  to  screen  the  independent  variables  to  deter¬ 
mine  the  best  subset  for  predicting  patient  satisfaction.  The  regression  sums  of 
squares  for  all  possible  regression  models  are  as  follows: 


Independent 

Independent 

Variables 

Variables 

in  Model 

SSR 

in  Model 

SSR 

Xx 

3,678.44 

X\,  X2 

4,081.21 

x2 

2,120.61 

Xux3 

3,608.17 

x3 

2,229.32 

X2,X3 

2,426.92 

Xi,  x2,  x3 

4,133.62 

a.  Indicate  which  subset  of  independent  variables  you  would  recommend  as 
best  for  predicting  Y  according  to  each  of  the  following  criteria:  (1)  Rp,  (2) 
MSEp,  (3)  Cp.  Support  your  recommendations  with  appropriate  graphs. 

b.  Did  the  three  criteria  in  part  (a)  identify  the  same  best  subset?  Does  this 
always  happen? 

c.  Would  stepwise  regression  have  any  advantages  here  as  a  screening  proce¬ 
dure  over  all  possible  regressions? 

12.6.  Roofing  shingles.  Data  on  sales  last  year  ( Y ,  in  thousand  squares)  in  20  sales 
districts  are  given  below  for  a  maker  of  asphalt  roofing  shingles.  Shown  also  are 
promotional  expenditures  (X),  in  thousand  dollars),  number  of  active  accounts 
(X2),  number  of  competing  brands  (X3),  and  district  potential  (X4,  coded)  for 
each  of  the  districts. 


District 


i 

A,-i 

Xa 

xa 

X<4 

Y, 

i 

5.5 

31 

10 

8 

79.3 

2 

2.5 

55 

8 

6 

200.1 

3 

8.0 

67 

12 

9 

163.2 

4 

3.0 

50 

7 

16 

200.1 

5 

3.0 

38 

8 

15 

146.0 

6 

2.9 

71 

12 

17 

177.7 

7 

8.0 

30 

12 

8 

30.9 

8 

9.0 

56 

5 

10 

291.9 

9 

4.0 

42 

8 

4 

160.0 

10 

6.5 

73 

5 

16 

339.4 

11 

5.5 

60 

11 

7 

159.6 

12 

5.0 

44 

12 

12 

86.3 

13 

6.0 

50 

6 

6 

237.5 

14 

5.0 

39 

10 

4 

107.2 

15 

3.5 

55 

10 

4 

155.0 

16 

8.0 

70 

6 

14 

291.4 

17 

6.0 

40 

11 

6 

100.2 

18 

4.0 

50 

11 

8 

135.8 

19 

7.5 

62 

9 

13 

223.3 

20 

7.0 

59 

9 

11 

195.0 

It  is  believed  that  a  regression  model  containing  only  first-order  terms  and  no 
interaction  terms  will  be  appropriate. 

a.  Find  the  three  best  subsets  according  to  the  Cp  criterion.  Is  there  relatively 
little  bias  in  the  subset  model  with  the  smallest  Cp  value? 

b.  For  the  subset  model  with  the  smallest  Cp  value,  obtain  the  residuals  and 
plot  them  against  Y  and  each  of  the  independent  variables  in  the  subset 
model  on  separate  graphs.  Also  prepare  a  normal  probability  plot.  On  the 


440  /  Selection  of  independent  variables 


basis  of  your  plots,  do  you  suggest  any  modifications  in  the  model  to  be 
employed? 

12.7.  Job  proficiency.  A  personnel  officer  in  a  governmental  agency  administered 
four  newly  developed  aptitude  tests  to  each  of  25  applicants  for  entry-level 
clerical  positions  in  the  agency.  For  purposes  of  the  study,  all  25  applicants  were 
accepted  for  positions  irrespective  of  their  test  scores.  After  a  probationary  pe¬ 
riod,  each  applicant  was  rated  for  proficiency  on  the  job.  The  scores  on  the  four 
tests  (Xj,  X2,  X3,  X4)  and  the  job  proficiency  score  (Y)  for  the  25  employees 
were  as  follows: 


Subject 

X, 

Test  Score 

X-2  X3 

Job  Proficiency 
Score 

XA  Y 

1 

86 

110 

100 

87 

88 

2 

62 

97 

99 

100 

80 

3 

110 

107 

103 

103 

96 

4 

101 

117 

93 

95 

76 

5 

100 

101 

95 

88 

80 

6 

78 

85 

95 

84 

73 

7 

120 

77 

80 

74 

58 

8 

105 

122 

116 

102 

116 

9 

112 

119 

106 

105 

104 

10 

120 

89 

105 

97 

99 

11 

87 

81 

90 

88 

64 

12 

133 

120 

113 

108 

126 

13 

140 

121 

96 

89 

94 

14 

84 

113 

98 

78 

71 

15 

106 

102 

109 

109 

111 

16 

109 

129 

102 

108 

109 

17 

104 

83 

100 

102 

100 

18 

150 

118 

107 

110 

127 

19 

98 

125 

108 

95 

99 

20 

120 

94 

95 

90 

82 

21 

74 

121 

91 

85 

67 

22 

96 

114 

114 

103 

109 

23 

104 

73 

93 

80 

78 

24 

94 

121 

115 

104 

115 

25 

91 

129 

97 

83 

83 

It  is  expected  that  a  regression  model  containing  only  first-order  terms  and  no 
interaction  terms  will  be  appropriate. 

a.  Find  the  three  best  subsets  according  to  the  Cp  criterion.  Is  there  relatively 
little  bias  in  the  subset  model  with  the  smallest  Cp  value? 

b.  For  the  subset  model  with  the  smallest  Cp  value,  obtain  the  residuals  and 
plot  them  against  Y  and  each  of  the  independent  variables  in  the  subset 
model  on  separate  graphs.  Also  prepare  a  normal  probability  plot.  On  the 
basis  of  your  plots,  do  you  suggest  any  modifications  in  the  model  to  be 
employed? 

12.8.  Two  researchers  investigated  factors  affecting  summer  attendance  at  privately 
operated  beaches  on  Lake  Ontario,  and  collected  information  on  attendance  and 
11  explanatory  variables  for  42  beaches.  Two  summers  were  studied,  of  rela¬ 
tively  hot  and  relatively  cool  weather,  respectively.  A  “best”  subsets  algorithm 
now  is  to  be  used  to  screen  the  potential  independent  variables. 


Problems  /  441 


a.  Should  the  screening  be  done  for  both  summers  combined  or  should  it  be 
done  separately  for  each  summer?  Explain  the  problems  involved  and  how 
you  might  handle  them. 

b.  Will  the  “best”  subsets  screening  procedure  select  those  independent  varia¬ 
bles  that  are  most  important  in  a  causal  sense  for  determining  beach  attend¬ 
ance? 

12.9.  In  stepwise  regression,  what  advantage  is  there  in  using  a  relatively  large  F  limit 
for  adding  variables?  What  advantage  is  there  in  using  a  smaller  F  limit  for 
adding  variables? 

12.10.  In  stepwise  regression,  why  should  the  F  limit  for  deleting  variables  never  ex¬ 
ceed  the  F  limit  for  adding  variables? 

12.11.  Draw  a  flowchart  of  each  of  the  following  selection  methods:  (1)  stepwise  re¬ 
gression,  (2)  forward  selection,  (3)  backward  elimination. 

12.12.  Refer  to  Agency  revenues  Problems  12.3  and  12.4.  The  consultant  was  inter¬ 
ested  to  learn  how  the  stepwise  selection  procedure  and  some  of  its  variations 
would  perform  in  this  application. 

a.  Determine  the  subset  of  variables  that  is  selected  as  best  by  the  stepwise 
regression  procedure  using  F  limits  of  4.2  and  4.1  to  add  or  delete  a  varia¬ 
ble,  respectively.  Show  your  steps. 

b.  To  what  level  of  significance  in  any  individual  test  is  the  F  limit  of  4.2  for 
adding  a  variable  approximately  equivalent  here? 

c.  Determine  the  subset  of  variables  that  is  selected  as  best  by  the  forward 
selection  procedure  using  an  F  limit  of  4.2  to  add  a  variable.  Show  your 
steps. 

d.  Determine  the  subset  of  variables  that  is  selected  as  best  by  the  backward 
elimination  procedure  using  an  F  limit  of  4.1  to  delete  a  variable.  Show 
your  steps. 

e.  Compare  the  results  of  the  three  selection  procedures.  How  consistent  are 
these  results?  How  do  the  results  compare  with  those  for  all  possible  regres¬ 
sions  in  Problem  12.4? 

12.13.  Refer  to  Agency  revenues  Problem  12. 12a.  Suppose  the  consultant  “forced”  X2 
into  the  best  subset  for  administrative  reasons  by  arbitrarily  entering  it  first  and 
not  removing  it  even  if  its  F*  value  becomes  too  low.  Which  subset  of  variables 
(including  X2)  is  now  selected  as  best  by  the  stepwise  regression  procedure  if  F 
limits  of  4.2  and  4.1  are  used  to  add  or  delete  a  variable,  respectively?  Did  the 
forced  inclusion  of  X2  affect  the  selection  of  the  other  variables  included  iif  the 
best  subset?  Will  this  always  happen? 

12.14.  Refer  to  Patient  satisfaction  Problems  7.17  and  12.5.  The  hospital  administra¬ 
tor  was  interested  to  learn  how  the  stepwise  selection  procedure  and  some  of  its 
variations  would  perform  here. 

a.  Determine  the  subset  of  variables  that  is  selected  as  best  by  the  stepwise 
regression  procedure  using  F  limits  of  3.0  and  2.9  to  add  or  delete  a  varia¬ 
ble,  respectively.  Show  your  steps. 

b.  To  what  level  of  significance  in  any  individual  test  is  the  F  limit  of  3.0  for 
adding  a  variable  approximately  equivalent  here? 

c.  Determine  the  subset  of  variables  that  is  selected  as  best  by  the  forward 


442  /  Selection  of  independent  variables 


selection  procedure  using  an  F  limit  of  3.0  to  add  a  variable.  Show  your 
steps. 

d.  Determine  the  subset  of  variables  that  is  selected  as  best  by  the  backward 
elimination  procedure  using  an  F  limit  of  2.9  to  delete  a  variable.  Show 
your  steps. 

e.  Compare  the  results  of  the  three  selection  procedures.  How  consistent  are 
these  results?  How  do  the  results  compare  with  those  for  all  possible  regres¬ 
sions  in  Problem  12.5? 

12.15.  Refer  to  Roofing  shingles  Problem  12.6. 

a.  Using  stepwise  regression,  find  the  best  subset  of  independent  variables  to 
predict  sales.  Use  F  limits  for  adding  or  deleting  a  variable  of  4.0  and  3.9, 
respectively. 

b.  How  does  the  best  subset  according  to  stepwise  regression  compare  with  the 
best  subset  according  to  the  Cp  criterion  obtained  in  Problem  12.6a? 

12.16.  Refer  to  Job  proficiency  Problem  12.7. 

a.  Using  stepwise  regression,  find  the  best  subset  of  independent  variables  to 
predict  job  proficiency.  Use  F  limits  of  3.5  and  3.4  for  adding  or  deleting  a 
variable,  respectively. 

b.  How  does  the  best  subset  according  to  stepwise  regression  compare  with  the 
best  subset  according  to  the  Cp  criterion  obtained  in  Problem  12.7a? 

12.17.  An  engineer  has  stated:  “Screening  of  variables  should  always  be  done  using  the 

objective  stepwise  regression  procedure.”  Discuss. 


EXERCISES 

12.18.  The  true  quadratic  regression  function  is  E(Y)  —  15  +  20X  +  3X2.  The  fitted 
linear  regression  function  is  Y  =  13  +  4-OX,  for  which  E(b0)  =  10  and  E(b{)  = 
45.  What  are  the  bias  and  sampling  error  components  of  the  mean  squared  error 
for  =  10?  For  X,  =  20? 

12.19.  Prove  (12.12).  [Hint:  Use  Exercise  6.33  and  (11.50).] 

12.20.  Refer  to  (12. 16).  Show  that  the  same  variable  Xk  that  maximizes  the  test  statistic 
F*  also  maximizes  the  coefficient  of  partial  determination  r\k  l. 


PROJECTS 

12.21.  Refer  to  the  SENIC  data  set.  Length  of  stay  (T)  is  to  be  predicted,  and  the  pool  of 
potential  independent  variables  includes  all  other  variables  in  the  data  set  except 
medical  school  affiliation  and  region.  It  is  believed  that  a  model  with  log10  Y  as 
the  dependent  variable  and  the  independent  variables  in  first-order  terms  with  no 
interaction  terms  will  be  appropriate. 

a.  Using  the  Cp  criterion,  obtain  the  three  best  subsets.  Which  of  these  subset 
models  appears  to  have  the  smallest  bias?  Which  of  the  three  models  would 
you  recommend  as  best? 


Projects  /  443 


b.  Divide  the  data  set  into  two  halves  by  considering  the  first  56  observations 
as  one  half  and  the  remaining  57  observations  as  the  other  half.  Fit  the 
regression  model  recommended  as  best  in  part  (a)  using  the  first  56  observa¬ 
tions.  Then  obtain  the  deviations  of  the  remaining  57  observations  from 
their  respective  “predicted”  values,  i.e.,  obtain  Yt  —  Y,.  How  well  does  the 
model  perform  on  the  hold-out  (validation)  sample?  Calculate  S(T;  —  f,)2/n 
for  the  last  57  observations  and  compare  it  with  MSE  for  the  first  56  obser¬ 
vations.  Is  there  any  evidence  of  a  large  prediction  bias? 

c.  For  the  recommended  subset  model  in  part  (a),  obtain  the  residuals  and  plot 
them  against  the  fitted  values  and  each  of  the  independent  variables  in  the 
subset  model.  Also  prepare  a  normal  probability  plot.  Do  these  plots  suggest 
any  modifications  in  the  model? 

d.  Would  stepwise  regression  with  F  limits  of  3.5  and  3.4  for  adding  or  delet¬ 
ing  a  variable,  respectively,  lead  to  the  same  best  model  as  the  all-possible- 
regressions  approach? 

12.22.  Refer  to  the  SMSA  data  set.  A  public  safety  official  wishes  to  predict  the  rate  of 
serious  crimes  in  an  SMSA  ( Y ,  total  number  of  serious  crimes  per  100,000 
population) .  The  pool  of  potential  independent  variables  includes  all  other  varia¬ 
bles  in  the  data  set  except  region. 

a.  Using  the  Cp  criterion,  obtain  the  three  best  subsets.  Which  of  these  subset 
models  appears  to  have  the  smallest  bias?  Which  of  the  three  models  would 
you  recommend  as  best? 

b.  Divide  the  data  set  into  two  halves  by  considering  the  odd-numbered  obser¬ 
vations  as  one  half  and  the  even-numbered  observations  as  the  other  half.  Fit 
the  regression  model  recommended  as  best  in  part  (a)  using  the  odd-num¬ 
bered  observations.  Then  obtain  the  deviations  of  the  even-numbered  obser¬ 
vations  from  their  respective  “predicted”  values,  i.e.,  obtain  Y,  —  Yt.  How 
well  does  the  model  perform  on  the  hold-out  (validation)  sample?  Calculate 
2(1?  —  Yi)2ln  for  the  even-numbered  observations  and  compare  it  with  MSE 
for  the  odd-numbered  observations.  Is  there  any  evidence  of  a  large  predic¬ 
tion  bias? 

c.  For  the  recommended  subset  model  in  part  (a),  obtain  the  residuals  and  plot 
them  against  the  fitted  values  and  each  of  the  independent  variables  in  the 
subset  model.  Also  prepare  a  normal  probability  plot.  Do  these  plots  suggest 
any  modifications  in  the  model? 

d.  Would  stepwise  regression  with  F-to-enter  and  F-to-remove  values  of  4.0 
and  3.9,  respectively,  lead  to  the  same  best  model  as  the  all-possible- 
regressions  approach? 


CITED  REFERENCES 

12. 1  Daniel,  Cuthbert,  and  Fred  S.  Wood.  Fitting  Equations  to  Data.  2d  ed.  New  York: 
Wiley-Inter  science,  1980. 

12.2  Dixon,  W.  J.,  and  M.  B.  Brown,  eds.  BMDP-81,  Biomedical  Computer  Pro¬ 
grams,  P -Series.  Berkeley,  Calif.:  University  of  California  Press,  1981. 


13 


Autocorrelation  in 
time  series  data 


The  basic  regression  models  considered  so  far  have  assumed  that  the  random 
error  terms  et  are  either  uncorrelated  random  variables  or  independent  normal 
random  variables.  In  business  and  economics,  many  regression  applications  in¬ 
volve  time  series  data.  For  such  data,  the  assumption  of  uncorrelated  or  inde¬ 
pendent  error  terms  is  often  not  appropriate;  rather,  the  error  terms  are  frequently 
correlated  positively  over  time.  Error  terms  correlated  over  time  are  said  to  be 
autocorrelated  or  serially  correlated. 

A  major  cause  of  positively  autocorrelated  error  terms  in  business  and  eco¬ 
nomic  regression  applications  involving  time  series  data  is  the  omission  of  one  or 
several  key  variables  from  the  model.  When  time-ordered  effects  of  such  “miss¬ 
ing”  key  variables  are  positively  correlated,  the  error  terms  in  the  regression 
model  will  tend  to  be  positively  autocorrelated  since  the  error  terms  include 
effects  of  missing  variables.  Suppose,  for  example,  that  annual  sales  of  a  product 
are  regressed  against  average  yearly  price  over  a  period  of  30  years.  If  population 
size  has  an  important  effect  on  sales,  its  omission  from  the  model  may  lead  to  the 
error  terms  being  positively  autocorrelated  because  the  effect  of  population  size 
on  sales  likely  is  positively  correlated  over  time. 

Another  common  cause  of  positively  autocorrelated  error  terms  in  economic 
data  is  systematic  coverage  errors  in  the  dependent  variable  time  series,  which 
errors  often  tend  to  be  positively  correlated  over  time. 


444 


13.1  Problems  of  autocorrelation  /  445 


13.1  PROBLEMS  OF  AUTOCORRELATION 

If  the  error  terms  in  the  regression  model  are  positively  autocorrelated,  the  use 
of  ordinary  least  squares  procedures  has  a  number  of  important  consequences. 
We  summarize  these  first,  and  then  discuss  them  in  more  detail: 

1 .  The  ordinary  least  squares  regression  coefficients  are  still  unbiased,  but  they 
no  longer  have  the  minimum  variance  property  and  may  be  quite  inefficient. 

2.  MSE  may  seriously  underestimate  the  variance  of  the  error  terms. 

3.  s(bk)  calculated  according  to  ordinary  least  squares  procedures  may  seri¬ 
ously  underestimate  the  true  standard  deviation  of  the  estimated  regression 
coefficient  with  that  procedure. 

4.  The  confidence  intervals  and  tests  using  the  t  and  F  distributions,  discussed 
earlier,  are  no  longer  strictly  applicable. 

To  illustrate  these  problems  intuitively,  we  shall  consider  the  simple  linear 
regression  model  with  time  series  data: 

Yt  =  P  o  +  PiXt  +  et 

Here,  Yt  and  Xt  are  observations  for  period  t.  Let  us  assume  that  the  error  terms  st 
are  positively  autocorrelated  as  follows: 

Et  Et—  l  f 

The  ut,  called  disturbances ,  are  independent  normal  random  variables.  Thus,  any 
error  term  Et  is  the  sum  of  the  previous  error  term  et-X  and  a  new  disturbance 
term  ut.  We  shall  assume  here  that  the  u,  have  mean  0  and  variance  1. 

In  Table  13.1,  column  1,  we  show  10  random  observations  on  the  normal 
variable  ut  with  mean  0  and  variance  1 ,  obtained  from  a  standard  normal  random 
numbers  generator.  Suppose  now  that  e0  =  3.0;  we  obtain  then: 

£i  =  £o  +  U\  =  3.0  +  .5  =  3.5 
s2  =  Ei  +  U2  =  3.5  —  .7  =  2.8 
etc. 

TABLE  13.1  Example  of  positively  autocorrelated  error  terms 


t 

0) 

ut 

£,-i 

(2) 

+  Ut  =  £, 

0 

_ 

3.0 

1 

+  .5 

3.0  + 

.5  =  3.5 

2 

-.7 

3.5  - 

.7  =  2.8 

3 

+  .3 

2.8  + 

.3=  3.1 

4 

0 

3.1  + 

0  =  3.1 

5 

-2.3 

3.1  - 

2.3  =  .8 

6 

-1.9 

.8  - 

1.9  =  -1.1 

7 

+  .2 

-1.1  + 

.2  =  -  .9 

8 

-.3 

-  .9- 

.3  =  -1.2 

9 

+  .2 

-1.2  + 

.2  =  -1.0 

10 

-.1 

-1.0  - 

.1  =  -1.1 

446  /  Autocorrelation  in  time  series  data 


The  error  terms  et  are  shown  in  Table  13.1,  column  2,  and  they  are  plotted  in 
Figure  13.1.  Note  the  systematic  pattern  in  these  error  terms.  Their  positive 
relation  over  time  is  shown  by  the  fact  that  adjacent  error  terms  tend  to  be  of  the 
same  magnitude. 

Suppose  that  Xt  in  the  regression  model  represents  time,  such  that  Xx  =  1, 
X2  =  2,  etc.  Further,  suppose  we  know  that  /30  =  2  and  fa  =  .5.  Figure  13.2a 
contains  the  tme  regression  line  and  the  observed  Y  values  based  on  the  error 
terms  in  Figure  13.1.  Figure  13.2b  contains  the  estimated  regression  line,  fitted 
by  ordinary  least  squares  methods,  and  the  observed  Y  values.  Notice  that  the 
fitted  regression  line  differs  sharply  from  the  tme  regression  line  because  the 
initial  e0  value  was  large  and  the  succeeding  positively  autocorrelated  error  terms 
tended  to  be  large  for  some  time.  This  persistency  pattern  in  the  positively  auto¬ 
correlated  error  terms  leads  to  a  fitted  regression  line  far  from  the  true  one.  Had 
the  initial  e0  value  been  small,  say,  e0  =  —.2,  and  the  disturbances  different,  a 
sharply  different  fitted  regression  line  might  have  been  obtained  because  of  the 
persistency  pattern,  as  shown  in  Figure  13.2c.  This  variation  from  sample  to 
sample  in  the  fitted  regression  lines  because  of  the  positively  autocorrelated  error 
terms  may  be  so  substantial  as  to  lead  to  large  variances  of  the  estimated  regres¬ 
sion  coefficients  when  ordinary  least  squares  methods  are  used. 

Another  key  problem  with  applying  ordinary  least  squares  methods  when  the 
error  terms  are  positively  autocorrelated,  as  mentioned  before,  is  that  MSE  may 

FIGURE  13.1  Example  of  positively  autocorrelated  error  terms 


Error  Term 


0123456789  10  t 


Time 


13.1  Problems  of  autocorrelation  /  447 


FIGURE  13.2  Regression  with  positively  autocorrelated  error  terms 

(a)  True  Regression  Line  and  Observations  when  £0  =  3 


(b)  Fitted  Regression  Line  and  Observations  when  £0  =  3 


(c)  Fitted  Regression  Line  and  Observations  with  £0  =  —  .2 


448  /  Autocorrelation  in  time  series  data 


seriously  underestimate  the  variance  of  the  et.  Figure  13.2  makes  this  clear.  Note 
that  the  variability  of  the  Y  values  around  the  fitted  regression  line  in 
Figure  13.2b  is  substantially  smaller  than  the  variability  of  the  F’s  around  the 
true  regression  line  in  Figure  13.2a.  This  is  one  of  the  factors  leading  to  an 
indication  of  greater  precision  of  the  regression  coefficients  than  is  actually  the 
case  when  ordinary  least  squares  methods  are  used  in  the  presence  of  positively 
autocoirelated  errors. 

In  view  of  the  seriousness  of  the  problems  created  by  autocorrelated  errors,  it 
is  important  that  their  presence  be  detected.  A  plot  of  residuals  against  time  is  an 
effective,  though  subjective,  means  of  detecting  autocorrelated  errors.  Formal 
statistical  tests  have  also  been  developed.  A  widely  used  test  is  based  on  the 
first-order  autoregressive  error  model,  which  we  take  up  next.  This  model  is  a 
simple  one,  yet  experience  suggests  that  it  is  frequently  applicable  in  business 
and  economics  when  the  error  terms  are  serially  correlated. 

13.2  FIRST-ORDER  AUTOREGRESSIVE  ERROR  MODEL 
Simple  linear  regression 

The  simple  linear  regression  model  for  one  independent  variable  with  the 
random  error  terms  following  a  first-order  autoregressive  process  is: 

(13.1)  r,  =  ft  +  p,x,  +  e, 

&t  P^t—i  T  ut 

where: 


p  is  a  parameter  such  that  |  p  j  <  1 
ut  are  independent  N{ 0,  a2) 

Note  that  (13. 1)  is  identical  to  the  simple  linear  regression  model  (3.1)  except 
for  the  structure  of  the  error  terms.  Each  error  term  in  model  (13.1)  consists  of  a 
fraction  of  the  previous  error  term  (when  p  >  0)  plus  a  new  disturbance  term  ut. 
The  parameter  p  is  called  the  autocorrelation  parameter. 


Multiple  regression 

The  multiple  regression  model  with  the  random  error  terms  following  a  first- 
order  autoregressive  process  is: 


(13.2) 


Yt-  P o  +  PiXn  +  p2Xt2  H - f  Pp- \XttP-i  +  et 

et  =  pet-i  +  ut 


where: 

I  P\  <  1 

ut  are  independent  N( 0,  or2) 


Thus,  we  see  again  that  the  multiple  regression  model  (13.2)  is  identical  to  the 
earlier  multiple  regression  model  (7.7)  except  for  the  structure  of  the  error  terms. 


13.2  First-order  autoregressive  error  model  /  449 


Properties  of  error  terms 

It  is  instructive  to  expand  the  definition  of  the  first-order  autoregressive  error 
term  et: 


£f  P^t—1  Uf 

Since  st-i  =  pst-2  +  ut-i,  we  obtain: 

ef  =  p{pet- 2  +  ut-i)  +  ut  =  p2£t-2  +  put-i  +  ut 
Replacing  now  et-2  by  pef-3  +  ut-2,  we  obtain: 

et  =  P3  &t- 3  +  P2ut-2  +  put-\  +  ut 
Continuing  in  this  fashion,  we  find: 


oo 

(13.3)  et  =  2  P%s 

s= 0 

Thus,  the  error  term  et  in  period  f  is  a  linear  combination  of  the  current  and 
preceding  disturbance  terms.  When  0<p<l,(13.3)  indicates  that  the  further 
the  period  is  in  the  past,  the  smaller  is  the  weight  of  that  disturbance  term  in 
determining  et. 

Mean.  Since  E(ut)  =  0  according  to  models  (13.1)  and  (13.2)  for  all  t,  it 
follows  from  (13.3)  that: 

(13.4)  E{et)  =  0 

Thus,  et  has  expectation  zero,  just  as  for  regression  models  with  uncorrelated 
error  terms. 


Variance.  Since  according  to  models  (13.1)  and  (13.2)  the  ut  are  independ¬ 
ent  with  variance  a2,  it  follows  from  (13.3)  that: 


°'2(ef)  =  2  p2SV2(Ut-s)  =  O-2  2  p2s 


5  =  0 


5  =  0 


Now  for  |  p  |  <  1 ,  it  is  known  that: 


Hence,  we  have: 

(13.5) 


co 


5  =  0 


<r2(£t) 


a 


2 


1  ~ 


P 


2 


We  thus  see  that  the  error  terms  have  constant  variance,  just  as  for  regression 
models  with  uncorrelated  error  terms. 


450  /  Autocorrelation  in  time  series  data 


Covariance.  To  find  the  covariance  of  et  and  st-\,  we  need  to  recognize 
that: 


or2{et)  =  E(ef) 
cr(et,  et- 1)  =  E(etet-i) 

These  results  follow  from  theorems  (1.14a)  and  (1.19a),  respectively,  since 
E(et)  =  0  by  (13.4). 

By  (13.3),  we  have: 

E{etet-X)  =  E[(ut  +  put-X  +  p2ut- 2  +  •  ■  •)(«*-  x  +  put-2  +  p\-3  +  ■••)] 
which  can  be  rewritten: 


E(etet- 1)  =  E{[ut  +  p(ut- 1  +  put-2  +  •  •  ■)][ut-1  +  put-2  +  pV-3  +  •••]} 
=  E[ut(ut- 1  +  put- 2  +  p2llf—3  +  •  •  •)] 

+  E[p(ut-  I  +  put- 2  +  p2Ut- 3  +  •  •  -)2] 


Since  E(utut-S)  =  0  for  all  .s'  0  by  the  assumed  independence  of  the  ut,  the  first 

term  drops  out  and  we  obtain: 

E(etet- 1)  =  pE(ef-X)  =  pa2{et-X) 

Hence,  by  (13.5),  which  holds  for  all  t,  we  have: 

(13.6)  a(et,  et-i)  =  p 


In  general,  it  can  be  shown  that: 

( 

(13.7)  a(et,  et-s)  =  ■■■■■ 


^  0 


Thus,  the  error  terms  for  models  (13.1)  and  (13.2)  are  autocorrelated  unless  the 
autocorrelation  parameter  p  equals  zero. 


Note 

It  follows  directly  from  (13.5)  and  (13.6)  that  the  autocorrelation  parameter  p  is  the 
coefficient  of  correlation  between  et  and  et-\,  as  defined  in  (15.3). 


13.3  DURBIN-WATSON  TEST  FOR  AUTOCORRELATION 

The  Durbin-Watson  test  assumes  the  first-order  autoregressive  error  models 
(13.1)  or  (13.2),  with  the  values  of  the  independent  variable(s)  fixed.  The  test 
consists  of  determining  whether  or  not  the  autocorrelation  parameter  p  in  (13.1) 
or  (13.2)  is  zero.  Note  that  if  p  =  0,  et  =  ut.  Hence,  the  error  terms  et  are  then 
independent  since  the  ut  are  independent. 

Because  correlated  error  terms  in  business  and  economic  applications  tend  to 
show  positive  serial  correlation,  the  usual  test  alternatives  considered  are: 


13.3  Durbin-Watson  test  for  autocorrelation  /  451 


(13.8) 


H0:  p  =  0 
Ha:p>  0 


The  test  statistic  D  is  obtained  by  first  fitting  the  ordinary  least  squares  regres¬ 
sion  function  and  calculating  the  residuals: 

(13.9)  et  =  Yt  —  % 


and  then  calculating  the  statistic: 


(13.10) 


2  (et  ~  et-\Y 


D  = 


t= 2 


2 

t=  i 


ef2 


where  n  is  the  number  of  observations. 

An  exact  test  procedure  is  not  available,  but  Durbin  and  Watson  have  obtained 
lower  and  upper  bounds  dL  and  dv  such  that  a  value  of  D  outside  these  bounds 
leads  to  a  definite  decision.  The  decision  rule  for  testing  between  the  alternatives 
in  (13.8)  is: 

If  D  >  dv,  conclude  H0 
(13.11)  If  D  <  dL,  conclude  Ha 

If  dL  <  D  <  dxj,  the  test  is  inconclusive 

Small  values  of  D  lead  to  the  conclusion  that  p  >  0  because  the  adjacent  error 
terms  et  and  et-X  tend  to  be  of  the  same  magnitude  when  they  are  positively 
autocorrelated.  Hence,  the  differences  in  the  residuals,  et  —  et-x,  would  tend  to 
be  small  when  p  >  0,  leading  to  a  small  numerator  in  D  and  hence  to  a  small  test 
statistic  D. 

Table  A-6  contains  the  bounds  dL  and  dv  for  various  sample  sizes  (n) ,  for  two 
levels  of  significance  (.05  and  .01),  and  for  various  numbers  of  X  variables 
(p  —  1)  in  the  regression  model. 

Example 

The  Blaisdell  Company  wished  to  predict  its  sales  by  using  industry  sales  as  a 
predictor  variable.  (Accurate  predictions  of  industry  sales  are  available  from  the 
industry’s  trade  association.)  In  Table  13.2,  columns  1  and  2  contain  seasonally 
adjusted  quarterly  data  on  company  sales  and  industry  sales,  respectively,  for  the 
period  1977-81.  A  scatter  plot  (not  shown)  suggested  that  a  linear  regression 
model  is  appropriate.  The  market  research  analyst  was,  however,  concerned 
whether  or  not  the  error  terms  were  positively  autocorrelated.  He  therefore  used 
the  Durbin-Watson  test  with  the  alternatives: 


Ho-  p  =  0 
Ha  :p>  0 


452  /  Autocorrelation  in  time  series  data 


TABLE  13.2  Durbin-Watson  test  calculations  for  Blaisdell  Company  example  (company 
and  industry  sales  data  are  seasonally  adjusted) 


Year  and 
Quarter 

t 

(1) 

Company 

Sales 

($  millions ) 

Y, 

(2) 

Industry 

Sales 

($  millions) 

x, 

(3) 

Residual 

e, 

(4) 

e,  -  et-i 

(5) 

(et  -  et-i)2 

(6) 

ef 

1977:  1 

1 

20.96 

127.3 

-.026052 

_ 

_ 

.0006787 

2 

2 

21.40 

130.0 

-.062015 

-.035963 

.0012933 

.0038459 

3 

3 

21.96 

132.7 

.022021 

.084036 

.0070620 

.0004849 

4 

4 

21.52 

129.4 

.163754 

.141733 

.0200882 

.0268154 

1978:  1 

5 

22.39 

135.0 

.046570 

-.117184 

.0137321 

.0021688 

2 

6 

22.76 

137.1 

.046377 

-.000193 

.0000000 

.0021508 

3 

7 

23.48 

141.2 

.043617 

-.002760 

.0000076 

.0019024 

4 

8 

23.66 

142.8 

-.058435 

-.102052 

.0104146 

.0034146 

1979:  1 

9 

24.10 

145.5 

-.094399 

-.035964 

.0012934 

.0089112 

2 

10 

24.01 

145.3 

-.149142 

-.054743 

.0029968 

.0222433 

3 

11 

24.54 

148.3 

-.147991 

.001151 

.0000013 

.0219013 

4 

12 

24.30 

146.4 

-.053054 

.094937 

.0090130 

.0028147 

1980:  1 

13 

25.00 

150.2 

-.022928 

.030126 

.0009076 

.0005257 

2 

14 

25.64 

153.1 

.105852 

.128780 

.0165843 

.0112046 

3 

15 

26.36 

157.3 

.085464 

-.020388 

.0004157 

.0073041 

4 

16 

26.98 

160.7 

.106102 

.020638 

.0004259 

.0112576 

1981:  1 

17 

27.52 

164.2 

.029112 

-.076990 

.0059275 

.0008475 

2 

18 

27.78 

165.6 

.042316 

.013204 

.0001743 

.0017906 

3 

19 

28.24 

168.7 

-.044160 

-.086476 

.0074781 

.0019501 

4 

20 

28.78 

171.7 

-.033009 

.011151 

.0001243 

.0010896 

Total 

.0979400 

.1333018 

He  fitted  an  ordinary  least  squares  regression  line  to  the  data  in  Table  13.2. 
The  results  are  shown  in  Table  13.3a.  He  then  calculated  the  residuals  et,  which 
are  shown  in  column  3  of  Table  13.2  and  which  are  plotted  against  time  in 
Figure  13.3.  Note  how  the  residuals  consistently  are  above  or  below  the  fitted 
values  for  extended  periods.  Positive  autocorrelation  in  the  error  terms  is  sug¬ 
gested  by  such  a  pattern  when  an  appropriate  regression  function  has  been  em¬ 
ployed. 

FIGURE  13.3  Residuals  plotted  against  time — Blaisdell  Company  example 

0.  2  1 


0.0- 
0.  1  - 


o 

D 

~a 

'(/) 

CD 

CC  - 


-0.  2 


T 

4 


i - r 

8  12 

Time 


7s 


—\ 

20 


13.3  Durbin-Watson  test  for  autocorrelation  /  453 


TABLE  13.3  Regression  results  for  Biaisdell  Company  example 


(a)  Original  Variables  Yt  and  X, 

Regression  Estimated  Estimated 

Coefficient  Regression  Coefficient  Standard  Deviation 


Po  b0  =  -1.45475  s(b0)  =  .21415 

Pi  bt=  .17628  s(bx)  =  .00144 

Y=  -1.45475  +  .  17628X 

(b)  Transformed  Variables  Y't  =  Yt  —  rY.  , 
and  X't  =Xt-  rXt_x 

Regression  Estimated  Estimated 

Coefficient  Regression  Coefficient  Standard  Deviation 


po  b'0  =  -  .39396  s{b'0)  =  .16704 

p[  =  pi  b[=  .17376  s(b'{)  =  .00296 


bo  — 


s(b0)  = 


s(b &) 
1  -  r 


-.39396 
1  -  .631166 
.16704 
1  -  .631166 


-1.06811 

.45288 


(c)  First  Differences  Y't  =  Yt  —  Y,- 1  and  X't  =  X,  —  V,_, 
Regression  Estimated  Estimated 

Coefficient  Regression  Coefficient  Standard  Deviation 


p[  =  Pi  b  i  =  .16849  s(bx)  =  .005096 


Columns  4,  5,  and  6  of  Table  13.2  contain  the  necessary  calculations  for  the 
test  statistic  D.  The  analyst  then  obtained: 

20 


2  (et  ~  et- i)2 

r= 2 _ 

20 

t—  1 


.09794 

.13330 


.735 


Using  a  level  of  significance  of  .01,  he  found  in  Table  A-6  for  n  =  20  and 

P~  1  =  1: 


dL=  -95 
dv=  1.15 

Since  D  —  .735  falls  below  dL  =  .95,  decision  rule  (13.11)  indicates  that  the 
appropriate  conclusion  is  Ha,  namely,  that  the  error  terms  are  positively  auto- 
correlated. 


Comments 

1.  If  a  test  for  negative  autocorrelation  is  required,  the  test  statistic  to  be  used  is 
4  —  D,  where  D  is  defined  as  above.  The  test  is  then  conducted  in  the  same  manner 


454  /  Autocorrelation  in  time  series  data 


described  for  testing  for  positive  autocorrelation.  That  is,  if  the  quantity  4  —  D  falls  below 
dL,  we  conclude  p  <  0,  that  negative  autocorrelation  exists,  and  so  on. 

2.  A  two-sided  test  for  H0:  p  =  0  versus  Ha:  p/0  can  be  made  by  employing  both 
one-sided  tests  separately.  The  Type  I  risk  with  the  two-sided  test  is  2a,  where  a  is  the 
Type  I  risk  with  each  one-sided  test. 

3 .  When  the  Durbin-Watson  test  employing  the  bounds  dL  and  dv  gives  indeterminate 
results,  in  principle  more  observations  are  required.  Of  course,  with  time  series  data  it 
may  be  impossible  to  obtain  more  observations,  or  additional  observations  may  lie  in  the 
future  and  be  obtainable  only  with  great  delay.  Durbin  and  Watson  (Ref.  13.1)  do  give  an 
approximate  test  which  may  be  used  when  the  bounds  test  is  indeterminate,  but  the 
degrees  of  freedom  should  be  larger  than  about  40  before  this  approximate  test  will  give 
more  than  a  rough  indication  of  whether  autocorrelation  exists. 

A  reasonable  procedure  is  to  treat  indeterminate  results  as  suggesting  the  presence  of 
autocorrelated  errors  and  employ  one  of  the  remedial  actions  to  be  discussed  next.  If  such 
an  action  does  not  lead  to  substantially  different  regression  results,  the  assumption  of 
uncorrelated  error  terms  would  appear  to  be  satisfactory.  When  the  remedial  action  does 
lead  to  substantially  different  regression  results  (such  as  larger  estimated  standard  errors 
for  the  regression  coefficients  or  the  elimination  of  autocorrelated  errors),  the  results 
obtained  by  means  of  the  remedial  action  are  probably  the  more  useful  ones. 

4.  The  Durbin-Watson  test  is  not  robust  against  misspecifications  of  the  model.  For 
example,  the  Durbin-Watson  test  may  not  disclose  the  presence  of  autocorrelated  errors 
that  follow  a  second-order  autoregressive  pattern,  where  an  error  term  in  period  t  is 
directly  related  to  the  error  term  in  period  t  —  2. 

5.  While  the  Durbin-Watson  test  is  widely  used,  other  tests  for  autocorrelation  are 
available.  One  such  alternative  test,  due  to  Theil  and  Nagar,  is  found  in  Reference  13.2. 

13.4  REMEDIAL  MEASURES  FOR  AUTOCORRELATION 

The  two  principal  remedial  measures  when  autocorrelated  error  terms  are 
present  are  to  add  one  or  more  independent  variables  to  the  model  or  to  use 
transformed  variables. 

Addition  of  independent  variables 

As  noted  earlier,  one  major  cause  of  autocorrelated  error  terms  is  the  omission 
from  the  model  of  one  or  more  key  independent  variables  that  have  time-ordered 
effects  on  the  dependent  variable.  When  autocorrelated  error  terms  are  found  to 
be  present,  the  first  remedial  action  should  always  be  to  search  for  missing  key 
independent  variables.  The  missing  variable,  population  size,  was  mentioned 
previously  for  a  regression  of  annual  sales  of  a  product  on  average  yearly  price  of 
the  product  during  a  30-year  period.  Sometimes,  use  of  a  simple  linear  trend 
variable  or  use  of  indicator  variables  for  seasonal  effects  can  be  helpful  in  elimi¬ 
nating  or  reducing  autocorrelation  in  the  error  terms. 

Use  of  transformed  variables 

Only  when  use  of  additional  independent  variables  is  not  helpful  in  eliminat¬ 
ing  the  problem  of  autocorrelated  errors  should  a  remedial  action  based  on  trans- 


13.4  Remedial  measures  for  autocorrelation  /  455 


formed  variables  be  employed.  We  now  explain  two  methods  of  transforming  the 
variables.  One  is  based  on  an  iterative  approach  and  the  other  uses  first  differ¬ 
ences.  We  shall  explain  these  methods  for  simple  linear  regression,  but  the  same 
principles  apply  for  multiple  regression. 

Iterative  approach.  The  iterative  approach  for  simple  linear  regression  is 
motivated  by  an  interesting  property  of  model  (13.1).  Consider  the  transformed 
dependent  variable: 

Y;  =  Yt-  pYt-i 

Substituting  in  this  expression  for  Yt  and  Yt-X  according  to  model  (13.1),  we 
obtain: 

Y't  =  (ft  +  PiXt  +  et)  -  p(ft  +  PiXt- 1  +  £t-i) 

=  ftd  -  p)  +  Pi(Xt  ~  pXt-Y)  +  ( et  -  p£?-i) 

But  by  (13.1),  st  —  pet-i  =  ut.  Hence: 

Y't  =  ft(l  ~P)  +  Pi(Xt  -  pXt-X)  +  ut 

where  the  ut  are  independent  error  terms.  Thus,  when  we  use  the  transformed 
variables: 

(13.12a)  Y\=Yt-  pYt-1 

(13.12b)  X't  =  Xt-  pXt-, 

the  transformed  regression  model  becomes: 

(13.13)  y;  =  ft  +  +  k, 

where: 

ft  =  ft(l  -  p) 

ft  =fr 

Note  that  the  transformed  regression  model  (13.13)  has  independent  error  terms. 
This  means  that  ordinary  least  squares  methods  have  their  usual  optimum  proper¬ 
ties  with  this  model. 

The  parameters  in  the  original  model  (13.1)  are  related  to  the  parameters  in 
the  transformed  model  (13.13)  as  follows: 

(13.14a)  ft  =  — 

1  ~  P 

(13.14b)  ft=ft 

The  transformed  model  (13.13)  cannot  be  used  directly  because  the  auto¬ 
correlation  parameter  p  needed  to  obtain  the  transformed  variables  in  (13. 12)  is 
unknown.  We  can,  however,  estimate  p.  Note  that  the  autoregressive  error  proc¬ 
ess  assumed  in  model  (13.1)  can  be  viewed  as  a  regression  through  the  origin: 


&t  P^t—  i  Y  lit 


456  /  Autocorrelation  in  time  series  data 


where  st  is  the  dependent  variable,  et-X  the  independent  variable,  ut  the  error 
term,  and  p  the  slope  of  the  line  through  the  origin.  Since  the  et  and  et-\  are 
unknown  we  use  the  residuals  et  and  et~x,  obtained  by  ordinary  least  squares 
methods,  as  the  dependent  and  independent  variables,  respectively,  and  estimate 
p  by  fitting  a  straight  line  through  the  origin.  From  our  previous  discussion  of 
regression  through  the  origin,  we  know  by  (5.17)  that  the  estimate  of  the  slope  p, 
denoted  by  r,  is: 


(13.15) 


2  O-iO 


t=2 


t= 2 


We  now  obtain  the  transformed  variables: 

(13.16a)  Y't  =  Yt-rYt-X 

(13.16b)  X’t=Xt-rXt-1 

and  use  ordinary  least  squares  with  these  transformed  variables. 

The  Durbin-Watson  test  is  then  employed  to  test  whether  the  error  terms  for 
the  transformed  model  are  uncorrelated.  If  the  test  indicates  that  they  are  uncor¬ 
related,  the  procedure  terminates.  Otherwise,  the  parameter  p  is  reestimated  from 
the  new  residuals  for  the  regression  model  with  the  original  variables,  using  the 
regression  coefficients  derived  from  the  fit  of  the  regression  model  with  the 
transformed  variables.  A  new  set  of  transformed  variables  is  then  obtained  with 
the  new  r.  This  process  may  be  continued  for  several  iterations  until  the  Durbin- 
Watson  test  suggests  that  the  error  terms  in  the  transformed  model  are  uncorre¬ 
lated. 


Example.  For  our  Blaisdell  Company  example,  the  necessary  calculations 
for  estimating  the  autocorrelation  parameter  p,  based  on  the  residuals  obtained 
with  ordinary  least  squares  applied  to  the  original  variables,  appear  in  Table 
13.4.  Column  1  repeats  the  residuals  from  Table  13.2.  Column  2  contains  the 
residuals  et-X,  and  columns  3  and  4  contain  the  necessary  calculations.  Hence, 
we  estimate: 


_  .0834478 
.1322122 


.631166 


We  now  obtain  the  transformed  variables  Y’t  and  X[  in  (13.16a)  and  (13.16b): 

Y'=Yt-  ,6311661V! 

X’t  =  Xt-  .631166 Xt-i 

These  are  shown  in  Table  13.5.  Ordinary  least  squares  fitting  of  linear  regression 
is  now  used  with  these  transformed  variables.  The  results  are  shown  in  Table 
13.3b. 


13.4  Remedial  measures  for  autocorrelation  /  457 


TABLE  13.4  Calculations  for  estimating  p  for  Blaisdell  Company  example 


t 

(1) 

e, 

(2) 

et-\ 

(3) 

etet- i 

(4) 

e?-i 

1 

-.026052 

_ 

_ 

_ 

2 

-.062015 

-.026052 

.0016156 

.0006787 

3 

.022021 

-.062015 

-.0013656 

.0038459 

4 

.163754 

.022021 

.0036060 

.0004849 

5 

.046570 

.163754 

.0076260 

.0268154 

6 

.046377 

.046570 

.0021598 

.0021688 

7 

.043617 

.046377 

.0020228 

.0021508 

8 

-.058435 

.043617 

-.0025488 

.0019024 

9 

-.094399 

-.058435 

.0055162 

.0034146 

10 

-.149142 

-.094399 

.0140789 

.0089112 

11 

-.147991 

-.149142 

.0220718 

.0222433 

12 

-.053054 

-.147991 

.0078515 

.0219013 

13 

-.022928 

-.053054 

.0012164 

.0028147 

14 

.105852 

-.022928 

-.0024270 

.0005257 

15 

.085464 

.105852 

.0090465 

.0112046 

16 

.106102 

.085464 

.0090679 

.0073041 

17 

.029112 

.106102 

.0030889 

.0112576 

18 

.042316 

.029112 

.0012319 

.0008475 

19 

-.044160 

.042316 

-.0018687 

.0017906 

20 

Total 

-.033009 

-.044160 

.0014577 

.0834478 

.0019501 

.1322122 

TABLE  13.5  Transformed  variables  for  first  iteration — Blaisdell  Company  example 


t 

(1) 

Yt 

(2) 

Xt 

(3) 

Y't=Yt-  .631166 T,_! 

(4) 

X;  =  X ,  -  .63\l66Xt-x 

1 

20.96 

127.3 

_ 

_ 

2 

21.40 

130.0 

8.1708 

49.653 

3 

21.96 

132.7 

8.4530 

50.648 

4 

21.52 

129.4 

7.6596 

45.644 

5 

22.39 

135.0 

8.8073 

53.327 

6 

22.76 

137.1 

8.6282 

51.893 

7 

23.48 

141.2 

9.1147 

54.667 

8 

23.66 

142.8 

8.8402 

53.679 

9 

24.10 

145.5 

9.1666 

55.369 

10 

24.01 

145.3 

8.7989 

53.465 

11 

24.54 

148.3 

9.3857 

56.592 

12 

24.30 

146.4 

8.8112 

52.798 

13 

25.00 

150.2 

9.6627 

57.797 

14 

25.64 

153.1 

9.8608 

58.299 

15 

26.36 

157.3 

10.1769 

60.668 

16 

26.98 

160.7 

10.3425 

61.418 

17 

27.52 

164.2 

10.4911 

62.772 

18 

27.78 

165.6 

10.4103 

61.963 

19 

28.24 

168.7 

10.7062 

64.179 

20 

28.78 

171.7 

10.9559 

65.222 

458  /  Autocorrelation  in  time  series  data 


Based  on  the  fitted  regression  for  the  transformed  variables  in  Table  13.3b, 
residuals  were  obtained  and  the  Durbin- Watson  statistic  calculated.  The  result 
was  (calculations  not  shown)  D  =  1.65.  From  Table  A-6,  we  find  for  a  =  .01, 
p  —  1  =  1,  and  n  —  19: 

4  =.93  4=1.13 

Since  D  —  1.65  ^  cl i  /  —  1.13,  we  conclude  that  the  autocorrelation  coefficient 
for  the  error  terms  in  the  model  with  the  transformed  variables  is  zero. 

Having  successfully  handled  the  problem  of  autocorrelated  error  terms,  we 
now  transform  the  estimated  regression  coefficients  and  standard  deviations  back 
to  the  model  with  the  original  variables  (Table  13.3b): 

b0  =  —1.06811  s(b0)  =  .45288 

b\  =  .17376  s(b{)  =  .00296 

Note  that  the  estimated  regression  coefficients  b0  =  —1.06811  and  bx  =  .17376 
obtained  with  the  iterative  method  are  close  to  those  obtained  with  ordinary  least 
squares  (Table  13.3a),  but  that  the  estimated  standard  errors  s(b0)  =  .45288  and 
s(b  i)  =  .00296  with  the  iterative  method  are  larger  than  their  ordinary-least- 
squares  counterparts.  The  larger  standard  errors  are  to  be  expected,  since  we 
noted  earlier  that  positive  autocorrelation  may  lead  to  estimated  standard  devia¬ 
tions  s(bk )  calculated  according  to  ordinary  least  squares  that  seriously  underesti¬ 
mate  the  true  standard  deviations  cr(bk). 

Comments 

1 .  The  iterative  approach  does  not  always  work  properly.  A  major  reason  is  that  when 
the  error  terms  are  positively  autocorrelated,  the  estimate  r  in  (13.15)  tends  to  underesti¬ 
mate  the  autocorrelation  parameter  p.  When  this  bias  is  serious,  it  can  significantly  reduce 
the  effectiveness  of  the  iterative  approach. 

2.  There  exists  an  approximate  relation  between  the  Durbin- Watson  test  statistic  D  in 
(13.10)  and  the  estimated  autocorrelation  parameter  r  in  (13.15): 

(13.17)  D  —  2(1  —  r) 

This  relation  indicates  that  the  Durbin- Watson  statistic  ranges  approximately  between  0 
and  4  since  r  takes  on  values  between  —  1  and  1 ,  and  that  D  is  approximately  2  when 
r  =  0.  Note  that  for  the  Blaisdell  Company  example,  D  =  .735,  r  =  .631,  and  2(1  —  r) 
=  .738. 

First  differences  approach.  Some  economists  and  statisticians  have  sug¬ 
gested  that  instead  of  iterative  estimation  of  p,  which  is  not  always  successful, 
the  autocorrelation  parameter  be  assumed  to  equal  1.  If  p  =  1,  fio  =  /30(1  —  p) 
=  0,  and  the  transformed  model  (13.13)  becomes: 

(13.18)  Y[  =  (3{X't  +  ut 
where: 

F;  =Yt-  Yt-i 
X't  =  Xt~  Xt-r 


(13.18a) 


13.4  Remedial  measures  for  autocorrelation  /  459 


Thus,  the  regression  coefficient  /3{  =  /3i  can  be  directly  estimated  by  ordinary 
least  squares  methods  for  regression  through  the  origin.  Note  that  the  trans¬ 
formed  variables  (13.18a)  are  ordinary  first  differences.  It  has  been  found  that 
this  first  differences  approach  is  effective  in  a  variety  of  applications  in  reducing 
the  autocorrelations  of  the  error  terms,  and  of  course  it  is  much  simpler  than  the 
iterative  approach. 


Example.  Table  13.6  contains  the  transformed  variables  Y't  andX(,  based  on 
the  first  differences  transformations  in  (13.18a)  for  our  Blaisdell  Company  ex¬ 
ample.  Application  of  ordinary  least  squares  for  estimating  a  linear  regression 
through  the  origin  led  to  the  results  shown  in  Table  13.3c.  Note  that  the  estimated 
regression  coefficient  b  j  =  .  16849  is  similar  to  that  obtained  with  ordinary  least 
squares  applied  to  the  original  variables  ( b\  =  .  17628) ,  but  it  has  a  larger  stand¬ 
ard  error,  again  as  expected. 

TABLE  13.6  First  differences  for  Blaisdell  Company  data 


t 

(1) 

Yt 

(2) 

x, 

ft 

II 

-  w 

1  w 

It 

7 

(4) 

xi  =  xt~  X,-i 

1 

20.96 

127.3 

_ 

_ 

2 

21.40 

130.0 

.44 

2.7 

3 

21.96 

132.7 

.56 

2.7 

4 

21.52 

129.4 

-.44 

-3.3 

5 

22.39 

135.0 

.87 

5.6 

6 

22.76 

137.1 

.37 

2.1 

7 

23.48 

141.2 

.72 

4.1 

8 

23.66 

142.8 

.18 

1.6 

9 

24.10 

145.5 

.44 

2.7 

10 

24.01 

145.3 

-.09 

-.2 

11 

24.54 

148.3 

.53 

3.0 

12 

24.30 

146.4 

-.24 

-1.9 

13 

25.00 

150.2 

.70 

3.8 

14 

25.64 

153.1 

.64 

2.9 

15 

26.36 

157.3 

.72 

4.2 

16 

26.98 

160.7 

.62 

3.4 

17 

27.52 

164.2 

.54 

3.5 

18 

27.78 

165.6 

.26 

1.4 

19 

28.24 

168.7 

.46 

3.1 

20 

28.78 

171.7 

.54 

3.0 

Note 

Sometimes  the  first  differences  approach  can  overcorrect,  leading  to  negative  auto¬ 
correlations  in  the  error  terms.  Hence,  it  may  be  appropriate  to  use  a  two-sided  Durbin- 
Watson  test  when  testing  for  autocorrelation  with  first  differences  data.  One  complication 
arises  here.  The  first  differences  model  (13.18)  has  no  intercept  term,  but  the  Durbin- 
Watson  test  requires  a  fitted  regression  with  an  intercept  term.  A  valid  test  for  auto¬ 
correlation  in  a  no-intercept  model  can  be  carried  out  by  fitting  for  this  purpose  a  regres¬ 
sion  function  with  an  intercept  term.  Of  course,  the  fitted  no-intercept  model  is  still  the 
model  of  basic  interest.  In  our  Blaisdell  Company  example,  the  Durbin- Watson  statistic 


460  /  Autocorrelation  in  time  series  data 


for  the  fitted  first  differences  regression  model  with  an  intercept  term  is  D  =  1.75  (calcu¬ 
lation  not  shown).  This  indicates  uncorrelated  error  terms  for  either  a  one-sided  test  (with 
a  =  .01)  or  a  two-sided  test  (with  a  =  .02). 


Comments 

1 .  The  autoregressive  error  structure  can  also  be  used  to  advantage  in  situations  where 
predictions  of  the  dependent  variable  are  to  be  made.  Johnston  (Ref.  13.3)  discusses  this 
problem. 

2.  The  first-order  autoregressive  error  process  in  models  (13.1)  and  (13.2)  is  the 
simplest  kind.  A  second-order  process  would  be: 

(13.19)  et  =  p\£t-\  +  P2£t-2  +  ut 

Still  higher-order  processes  could  be  postulated.  Specialized  approaches  have  been  devel¬ 
oped  for  complex  autoregressive  error  processes.  These  are  discussed  in  treatments  of 
time  series  procedures  and  forecasting,  such  as  in  Reference  13.4. 


PROBLEMS 

13.1.  Refer  to  Table  13.1. 

a.  Plot  ef  against  et-\  for  t  —  1, . . . ,  10  on  a  graph.  How  is  the  positive  first- 
order  autocorrelation  in  the  error  terms  shown  by  the  plot? 

b.  If  you  plotted  ut  against  et-\  for  t  =  1, . . . ,  10,  what  pattern  would  you 
expect? 

13.2.  Refer  to  Plastic  hardness  Problem  2.18.  If  the  same  test  item  were  measured  at 
12  different  points  in  time,  would  the  error  terms  in  the  regression  model  likely 
be  autocorrelated?  Discuss. 

13.3.  A  student  stated  that  the  first-order  autoregressive  error  models  (13.1)  and  (13.2) 
are  too  simple  for  business  time  series  data  because  the  error  term  in  period  t  in 
such  data  is  also  influenced  by  random  effects  that  occurred  more  than  one 
period  in  the  past.  Comment. 

13.4.  A  student  writing  a  term  paper  used  ordinary  least  squares  in  fitting  a  simple 
linear  regression  model  to  some  time  series  data  containing  positively  auto¬ 
correlated  errors,  and  found  that  the  90  percent  confidence  interval  for  /3i  was 
too  wide  to  be  useful.  She  then  decided  to  employ  model  (13.1)  to  improve  the 
precision  of  the  estimate.  Comment. 

13.5.  For  each  of  the  following  tests  concerning  the  autocorrelation  parameter  p  in 
model  (13.2)  with  three  independent  variables,  state  the  appropriate  decision 
mle  based  on  the  Durbin- Watson  statistic  for  a  sample  of  size  38:  (1)  H0:  p  —  0, 
Ha:  p  0,  a  =  .02;  (2)  H0:  p  =  0,  Ha:  p  <  0,  a  =  .05;  (3)  H0:  p  =  0,  Ha: 

p  >  0,  a  —  .01. 

13.6.  Refer  to  Calculator  maintenance  Problem  2.16.  The  observations  are  listed  in 
time  order.  Assume  that  regression  model  (13.1)  is  appropriate.  Test  whether  or 
not  positive  autocorrelation  is  present;  use  a  —  .01.  State  the  alternatives,  deci¬ 
sion  rule,  and  conclusion. 


Problems  /  461 


13.7.  Refer  to  Chemical  shipment  Problem  7.12.  The  observations  are  listed  in  time 
order.  Assume  that  regression  model  (13.2)  is  appropriate.  Test  whether  or  not 
positive  autocorrelation  is  present;  use  a  —  .05.  State  the  alternatives,  decision 
rule,  and  conclusion. 

13.8.  Refer  to  Crop  yield  Problem  9.13.  The  observations  are  listed  in  time  order. 
Assume  that  regression  model  (13.2)  with  first-  and  second-order  terms  for  the 
two  independent  variables  and  no  interaction  term  is  appropriate.  Test  whether  or 
not  positive  autocorrelation  is  present;  use  a  =  .01.  State  the  alternatives,  deci¬ 
sion  rule,  and  conclusion. 

13.9.  Microcomputer  components.  A  staff  analyst  for  a  manufacturer  of  microcom¬ 
puter  components  has  compiled  monthly  data  for  the  past  16  months  on  the  value 
of  industry  production  of  processing  units  that  use  these  components  (A,  in 
million  dollars)  and  the  value  of  the  firm’s  components  used  ( Y ,  in  thousand 
dollars).  The  analyst  believes  that  a  simple  linear  regression  relation  is  appropri¬ 
ate  but  anticipates  positive  autocorrelation.  The  data  follow. 


t: 

1 

2 

3 

4 

5 

6 

7 

8 

Xt: 

y<- 

2.052 

102.9 

2.026 

101.5 

2.002 

100.8 

1.949 

98.0 

1.942 

97.3 

1.887 

93.5 

1.986 

97.5 

2.053 

102.2 

t: 

9 

10 

11 

12 

13 

14 

15 

16 

Xt: 

Yt: 

2.102 

105.0 

2.113 

107.2 

2.058 

105.1 

2.060 

103.9 

2.035 

103.0 

2.080 

104.8 

2.102 

105.0 

2.150 

107.2 

a.  Fit  a  simple  linear  regression  model  by  ordinary  least  squares  and  obtain  the 
residuals.  Also  obtain  s(b0)  and  s(b{). 

b.  Plot  the  residuals  against  time  and  explain  whether  you  find  any  evidence  of 
positive  autocorrelation. 

c.  Conduct  a  formal  test  for  positive  autocorrelation  using  a  significance  level 
of  .05.  State  the  alternatives,  decision  rule,  and  conclusion.  Is  the  residual 
analysis  in  part  (b)  in  accord  with  the  test  result? 

13.10.  Refer  to  Microcomputer  components  Problem  13.9.  The  analyst  has  decided  to 

employ  regression  model  (13.1)  and  use  the  iterative  approach  to  fit  the  model. 

a.  Obtain  a  point  estimate  of  the  autocorrelation  parameter.  How  well  does  the 
approximate  relationship  (13.17)  hold  here  between  this  point  estimate  and 
the  Durbin-Watson  test  statistic? 

b.  Use  one  iteration  to  obtain  the  estimates  bo  and  b[  of  the  regression  coeffi¬ 
cients  j6o  and  (3\  in  transformed  model  (13.13)  and  state  the  estimated  re¬ 
gression  function.  Also  obtain  s(bo)  and  s(b\). 

c.  Test  whether  any  positive  autocorrelation  remains  after  the  first  iteration 
using  a  significance  level  of  .05.  State  the  alternatives,  decision  rule,  and 
conclusion. 

d.  Restate  the  estimated  regression  function  obtained  in  part  (b)  in  terms  of  the 
original  variables.  Also  obtain  s(b0)  and  s{b\).  Compare  the  estimated  re¬ 
gression  coefficients  obtained  with  the  iterative  method  and  their  standard 
errors  with  those  obtained  with  ordinary  least  squares  in  Problem  13.9a. 

e.  Based  on  the  results  in  parts  (c)  and  (d),  does  the  iterative  method  appear  to 
have  been  effective  here? 


462  /  Autocorrelation  in  time  series  data 


13.11.  Refer  to  Microcomputer  components  Problems  13.9  and  13.10.  The  staff  ana¬ 
lyst  wishes  to  try  also  the  first  differences  approach. 

a.  Estimate  the  regression  coefficient  /3i  by  the  first  differences  approach,  and 
obtain  the  estimated  standard  deviation  of  this  estimate. 

b.  Compare  the  results  obtained  in  part  (a)  with  those  in  Problem  13.10d. 
Summarize  your  findings. 

c.  Test  whether  or  not  the  error  terms  with  the  first  differences  approach  are 
autocorrelated  using  a  two-sided  test  and  a  level  of  significance  of .  10.  State 
the  alternatives,  decision  rule,  and  conclusion.  Why  is  a  two-sided  test 
meaningful  here? 

13.12.  Advertising  agency.  The  managing  partner  of  an  advertising  agency  has  be¬ 
come  concerned  about  possible  inefficiencies  in  the  handling  of  client  accounts. 
Monthly  data  on  amount  of  billings  (Y,  in  thousands  of  constant  dollars)  and  on 
number  of  hours  of  staff  time  ( X ,  in  thousand  hours)  for  the  20  most  recent 
months  follow.  A  simple  linear  regression  model  is  believed  to  be  appropriate, 
but  positively  autocorrelated  error  terms  may  be  present. 

f:l  2345678  9  10 

X,:  2.521  2.171  2.234  2.524  2.305  2.523  3.020  3.014  3.532  3.461 

Yt:  220.4  203.9  207.2  221.9  211.3  222.7  247.6  247.6  272.9  269.1 

t:  11  12  13  14  15  16  17  18  19  20 

X,:  3.737  3.801  3.576  3.586  3.447  2.723  3.019  3.117  3.623  3.618 

Yt:  283.9  287.0  275.4  275.1  269.1  232.8  248.1  252.4  278.6  278.5 

a.  Fit  a  simple  linear  regression  model  by  ordinary  least  squares  and  obtain  the 
residuals.  Also  obtain  s(b0 )  and  s{b]). 

b.  Plot  the  residuals  against  time  and  explain  whether  you  find  any  evidence  of 
positive  autocorrelation. 

c.  Conduct  a  formal  test  for  positive  autocorrelation  using  a  significance  level 
of  .01.  State  the  alternatives,  decision  rule,  and  conclusion.  Is  the  residual 
analysis  in  part  (b)  in  accord  with  the  test  result? 

13.13.  Refer  to  Advertising  agency  Problem  13.12.  Assume  that  regression  model 
(13.1)  is  applicable. 

a.  Obtain  a  point  estimate  of  the  autocorrelation  parameter.  How  well  does  the 
approximate  relationship  (13.17)  hold  here  between  the  point  estimate  and 
the  Durbin-Watson  test  statistic? 

b.  Use  one  iteration  to  obtain  the  estimates  b'Q  and  b[  of  the  regression  coeffi¬ 
cients  j6o  and  in  transformed  model  (13.13)  and  state  the  estimated  re¬ 
gression  function.  Also  obtain  s(bo)  and  s(b[). 

c.  Test  whether  any  positive  autocorrelation  remains  after  the  first  iteration 
using  a  significance  level  of  .01.  State  the  alternatives,  decision  rule,  and 
conclusion. 

d.  Restate  the  estimated  regression  function  obtained  in  part  (b)  in  terms  of  the 
original  variables.  Also  obtain  s(b0)  and  s(b\).  Compare  the  estimated  re¬ 
gression  coefficients  obtained  with  the  iterative  method  and  their  standard 
errors  with  those  obtained  with  ordinary  least  squares  in  Problem  13.12a. 

e.  Based  on  the  results  in  parts  (c)  and  (d),  does  the  iterative  method  appear  to 
have  been  effective  here? 


Problems  /  463 


13.14.  Refer  to  Advertising  agency  Problems  13.12  and  13.13. 

a.  Estimate  the  regression  coefficient  /3i  using  the  first  differences  approach  by 
means  of  a  95  percent  confidence  interval.  Interpret  your  interval  estimate. 

b.  How  does  the  estimated  standard  deviation  of  b\  for  the  first  differences 
estimate  obtained  in  part  (a)  compare  with  that  for  the  iterative  method 
estimate  in  Problem  13.13d? 

13.15.  McGill  Company  sales.  The  data  below  show  seasonally  adjusted  quarterly 
sales  for  the  McGill  Company  (7,  in  million  dollars)  and  for  the  entire  industry 
(. X ,  in  million  dollars),  for  the  most  recent  20  quarters. 


t: 

1 

2 

3 

4 

5 

6 

7 

Xt: 

Yt: 

127.3 

20.96 

130.0 

21.40 

132.7 

21.96 

129.4 

21.52 

135.0 

22.39 

137.1 

22.76 

140.3 

23.48 

t: 

8 

9 

10 

11 

12 

13 

14 

X, : 
Yt: 

142.8 

23.66 

145.5 

24.10 

145.3 

24.01 

148.3 

24.54 

146.4 

24.81 

150.2 

25.00 

153.1 

25.64 

t: 

15 

16 

17 

18 

19 

20 

X 

Yt: 

157.3 

27.36 

160.7 

26.98 

164.2 

27.52 

165.6 

27.78 

168.7 

28.24 

171.9 

28.78 

The  first-order  autoregressive  error  model  (13.1)  is  to  be  employed. 

a.  Would  you  expect  the  autocorrelation  parameter  p  to  be  positive,  negative, 
or  zero  here? 

b.  Fit  the  linear  regression  model  by  ordinary  least  squares,  obtain  the  residu¬ 
als,  and  plot  them  against  time.  What  do  you  find? 

c.  Conduct  a  formal  test  for  positive  autocorrelation  using  a  =  .05.  State  the 
alternatives,  decision  rule,  and  conclusion. 

13.16.  Refer  to  McGill  Company  sales  Problem  13.15. 

a.  Use  one  iteration  with  the  iterative  method  to  estimate  the  parameters  /30  and 
j8x  in  regression  model  (13.1).  Also  obtain  the  estimated  standard  deviations 
of  these  estimates. 

b.  Test  whether  any  positive  autocorrelation  remains  after  the  first  iteration; 
use  a  —  .05.  State  the  alternatives,  decision  rule,  and  conclusion.  Does  the 
iterative  approach  appear  to  have  been  effective  here? 

c.  Estimate  /3i  with  a  90  percent  confidence  interval.  Interpret  your  interval 
estimate. 

13.17.  Refer  to  McGill  Company  sales  Problems  13.15  and  13.16. 

a.  Estimate  the  regression  coefficient  /3i  in  model  (13.1)  by  the  first  diffeT- 
ences  approach  using  a  90  percent  confidence  interval. 

b.  Compare  your  result  in  part  (a)  with  that  in  Problem  13.16c.  State  your 
findings. 

c.  Test  whether  or  not  the  error  terms  with  the  first  differences  approach  are 
positively  autocorrelated  using  a  =  .05.  State  the  alternatives,  decision 
rule,  and  conclusion. 

13.18.  A  student  applying  the  first  differences  transformation  (13.18a)  found  that  sev¬ 
eral  X't  values  equaled  zero  but  that  the  corresponding  Y\  values  were  nonzero. 

Does  this  signify  that  the  first  differences  transformation  is  not  apt  for  the  data? 


464  /  Autocorrelation  in  time  series  data 


EXERCISES 

13.19.  Derive  (13.7)  for  s  =  2. 

13.20.  Refer  to  the  first-order  autoregressive  error  model  (13.1).  Suppose  Yt  is  compa¬ 
ny’s  percent  share  of  the  market,  Xt  is  company’s  selling  price  as  a  percent  of 
average  competitive  selling  price,  /3o  =  100,  /3,  =  —.35,  p  =  .6,  <x2  =  1,  and 
g0  —  2.403.  Let  Xt  and  ut  be  as  follows  for  t  =  1, . . . ,  10: 

t:  1  2  3  4  5  6  7  8  9  10 

100  115  120  90  85  75  70  95  105  110 

ut:  .764  .509  -.242  -1.808  -.485  .501  -.539  .434  -.299  .030 

a.  Plot  the  true  regression  line.  Generate  the  observations  Y,  (t  =  1, . . . ,  10), 
and  plot  these  on  the  same  graph.  Fit  a  least  squares  regression  line  to  the 
observations  and  plot  it  also  on  the  same  graph.  How  does  your  fitted  regres¬ 
sion  line  relate  to  the  true  line? 

b.  Repeat  the  steps  in  part  (a)  but  this  time  let  p  =  0.  In  which  of  the  two  cases 
does  the  fitted  regression  line  come  closer  to  the  true  line?  Is  this  the  ex¬ 
pected  outcome? 

c.  Generate  the  observations  Yt  for  p  =  —.7.  For  each  of  the  cases  p  —  .6, 
p  =  0,  and  p  =  —.7,  obtain  the  successive  error  term  differences  st  —  ef_i 
(t=  1,...,10). 

d.  For  which  of  the  three  cases  in  part  (c)  is  2(e,  —  et~\)2  smallest?  For  which 
is  it  largest?  What  generalization  does  this  suggest? 

13.21.  Suppose  the  autoregressive  error  process  for  the  model  17=180  +  ft]Xt  +  et  is 
that  given  by  (13.19). 

a.  What  would  be  the  transformed  variables  Y't  and  X[  to  be  used  with  the 
iterative  method? 

b.  How  would  you  estimate  the  parameters  p,  and  P2  for  use  with  the  iterative 
method? 


PROJECTS 


13.22.  The  true  regression  model  is  Yt  =  10  +  24Xt  +  et,  where  et  =  ,8£f_i  +  ut  and 

ut  are  independent  N( 0,  25). 

a.  Generate  1 1  independent  random  numbers  from  jV(0,  25).  Use  the  first  ran¬ 
dom  number  as  £0,  obtain  the  10  error  terms  £1, . . . ,  £10,  and  then  calculate 
the  10  observations  Fj, . . .  ,F10  corresponding  to  X\  —  1,  X2  =  2, . . . , 
X10  =  10.  Fit  a  linear  regression  function  by  ordinary  least  squares  and 
calculate  MSE. 

b.  Repeat  part  (a)  100  times,  using  new  random  numbers  each  time. 

c.  Calculate  the  mean  of  the  100  estimates  b\.  Does  it  appear  that  b]  is  an 
unbiased  estimator  of  Pi  despite  the  presence  of  positive  autocorrelation? 

d.  Calculate  the  mean  of  the  100  estimates  MSE.  Does  it  appear  that  MSE  is  a 
biased  estimator  of  <x2?  If  so,  does  the  magnitude  of  the  bias  appear  to  be 
small  or  large? 


Cited  references  /  465 


CITED  REFERENCES 

13.1  Durbin,  J.,  and  G.  S.  Watson.  “Testing  for  Serial  Correlation  in  Least  Squares 
Regression.  II.”  Biometrika  38  (1951),  pp.  159-78. 

13.2  Theil,  H.,  and  A.  L.  Nagar.  “Testing  the  Independence  of  Regression  Disturb¬ 
ances.”  Journal  of  the  American  Statistical  Association  56  (1961),  pp.  793-806. 

13.3  Johnston,  J.  Econometric  Methods.  2d  ed.  New  York:  McGraw-Hill,  1971. 

13.4  Box,  G.  E.  P. ,  and  G.  M.  Jenkins.  Time  Series  Analysis,  Forecasting  and  Control. 
Rev.  ed.  San  Francisco:  Holden-Day,  1976. 


14 


Nonlinear  regression 


The  linear  regression  models  considered  up  to  this  point  are  satisfactory  for 
most  regression  applications.  There  are  occasions,  however,  when  a  nonlinear 
regression  model  is  most  suitable.  We  shall  consider  in  this  chapter  nonlinear 
regression  models,  how  to  obtain  least  squares  estimates  of  the  regression  param¬ 
eters  in  such  models,  and  how  to  make  inferences  about  these  regression  parame¬ 
ters.  The  analysis  of  nonlinear  regression  models  is  numerically  tedious  and  is 
therefore  heavily  computer-oriented. 

14.1  LINEAR,  INTRINSICALLY  LINEAR,  AND 
NONLINEAR  REGRESSION  MODELS 

Linear  regression  models 

In  previous  chapters,  we  considered  linear  regression  models,  i.e.,  models 
which  are  linear  in  the  parameters.  Such  models  can  be  represented  by: 

(14.1)  Yi  =  A,  +  ftX,i  +  fcXi2  +  • '  •  +  PP- +  Sj 

Linear  regression  models,  as  we  have  seen,  include  not  only  first-order  models  in 
p  —  1  independent  variables  but  also  more  complex  models.  For  instance,  a 
polynomial  regression  model  in  one  or  more  independent  variables  is  linear  in  the 


466 


14.1  Linear,  intrinsically  linear,  and  nonlinear  regression  models  /  467 


parameters,  such  as  the  following  model  in  two  independent  variables  with  lin¬ 
ear,  quadratic,  and  interaction  terms: 

(14.2)  Y,  =  ft,  +  (3  ,X„  +  ftX?!  +  ftXi2  +  +  fcXtlXl2  +  e, 

Also,  models  with  transformed  variables  that  are  linear  in  the  parameters  belong 
to  the  class  of  linear  regression  models,  such  as  the  following  model: 

(14.3)  log  io  Y,  =  /30  +  +  ftexp(X,-2)  +  e, 


Intrinsically  linear  regression  models 

In  addition  to  the  multitude  of  models  that  are  linear  in  the  parameters,  there 
are  other  models  that,  though  not  linear  in  the  parameters,  can  be  transformed  so 
that  the  parameters  appear  in  linear  fashion.  For  example,  the  exponential  model: 

(14.4)  yi=7o[exp(71Xt)]e/ 

is  nonlinear  in  the  parameters  y0  and  However,  this  model  can  be  trans¬ 
formed  into  the  linear  form  (14.1)  by  using  the  logarithmic  transformation: 

(14.5)  log,  Yi  =  log*  y0  +  7 iXi  +  log*  Si 
Letting: 

log,  Yt  =  Yi 
log.  To  =  /3o 
Y\  =  0i 
loge  Si  =  Si 

we  can  write  model  (14.5)  in  the  usual  form  for  a  linear  model: 

(14.5a)  Y\  =  0o  +  0A-  +  s( 

We  say  that  model  (14.4)  is  an  intrinsically  linear  model  because  it  can  be 
expressed  in  the  linear  form  (14. 1)  by  a  suitable  transformation.  Another  intrinsi¬ 
cally  linear  regression  model  is: 

(14.6)  Yi  =  [exp(y  )]X,-  +  * 

If  we  let  exp(y)  =  0i,  we  then  have  a  model  with  the  regression  through  the 
origin: 

(14.6a)  7,  =  PM  +  Si 

Note 

When  an  intrinsically  linear  model  has  been  transformed  into  the  linear  model  form, 
such  as  model  (14.4)  transformed  into  model  (14.5a),  it  is  important  to  study  the  linear¬ 
ized  model  for  aptness.  For  instance,  if  the  error  terms  in  (14.4)  are  normally  distrib¬ 
uted,  the  transformed  error  terms  e/  in  (14.5a)  will  not  be  normally  distributed. 


468  /  Nonlinear  regression 


Nonlinear  regression  models 

Nonlinear  regression  models  are  not  linear  in  the  parameters  and  cannot  be 
made  so  by  transformation.  For  example,  exponential  model  (14.4)  with  an  addi¬ 
tive  error  term: 

(14.7)  Yt  =  y0exp(yiXI)  +  et 

is  intrinsically  nonlinear  because  no  transformation  exists  which  transforms  this 
model  into  the  linear  form  (14.1).  A  more  general  nonlinear  exponential  model 
in  one  independent  variable  with  an  additive  error  term  is: 

(14.8)  Yt  =  y0  +  yiexp(y2X/)  +  e;- 

This  model  is  commonly  used  in  growth  studies  where  the  rate  of  growth  at  a 
given  time  X  is  proportional  to  the  amount  of  growth  remaining  as  time  in¬ 
creases,  with  y0  representing  the  maximum  growth  value.  Model  (14.8)  is  often 
employed  to  relate  the  concentration  of  a  substance  (7)  to  elapsed  time  (X). 
Figure  14.1a  shows  the  response  function  for  exponential  model  (14.8),  for 
y0  =  100,  yi  =  -50,  and  y2  =  -2. 

Another  nonlinear  regression  model  is  the  general  logistic  model  with  addi¬ 
tive  error  term: 


(14.9) 


Y  _ _ /u _ |_  e 

1  +  y1exp(y2Xj) 


This  model  has  been  used  in  population  studies  to  relate  number  of  species  (7)  to 
time  (X).  Recall  that  logistic  response  function  (10.53)  was  used  in  Chapter  10 
for  situations  where  the  dependent  variable  is  a  0,  1  indicator  variable.  Logistic 
response  function  (10.53)  is  a  special  type  of  logistic  function.  Figure  14.1b 


FIGURE  14.1  Graphs  of  nonlinear  regression  response  functions 


(a)  Exponential  Model  (14.8):  (b)  Logistic  Model  (14.9): 

E(Y)=  100  —  50  exp(— 2X")  E(Y)  =  1 0/ [  1  +  20  exp(-2V)] 


14.2  Example  /  469 


shows  the  response  function  for  the  general  logistic  model,  for  y0  =  10,  yi  = 
20,  and  y2  =  —2. 

Note 

When  nonlinear  growth  models  are  used  for  time  series  data,  it  is  important  to  ascer¬ 
tain  whether  or  not  the  error  terms  are  uncorrelated,  just  as  when  linear  regression  models 
are  applied  to  time  series  data. 


14.2  EXAMPLE 

In  order  to  illustrate  the  analysis  of  nonlinear  regression  models,  we  shall  use 
a  relatively  simple  two-parameter  example.  In  so  doing,  we  shall  be  able  to 
explain  the  concepts  and  procedures  without  overwhelming  the  reader  with  de¬ 
tails. 

A  hospital  administrator  wished  to  develop  a  regression  model  for  predicting 
the  degree  of  long-term  recovery  after  discharge  from  the  hospital  for  severely 
injured  patients.  The  predictor  variable  to  be  utilized  is  number  of  days  of  hospi¬ 
talization  ( X ),  and  the  dependent  variable  is  a  prognosis  index  for  long-term 
recovery  (Y) ,  with  large  values  of  the  index  reflecting  a  good  prognosis.  Data  for 
15  patients  were  studied;  these  are  presented  in  Table  14.1. 

Initially,  the  intrinsically  linear  exponential  model  (14.4)  was  fitted  to  the 
data,  employing  the  logarithmic  transformation  model  in  (14.5).  A  residual  anal¬ 
ysis  for  this  model  suggested,  however,  that  the  error  variance  increases  with  X. 
It  was  therefore  decided  to  use  the  two-parameter  exponential  model  (14.7)  with 
an  additive  error  term,  which  is  an  intrinsically  nonlinear  model: 

(14.10)  Yi  =  y0exp(yiX;)  +  et 

TABLE  14.1  Data  for  severely  injured  patients  example 


Patient 

i 

Days 

Hospitalized 

x, 

Prognosis 

Index 

Yt 

1 

2 

54 

2 

5 

50 

3 

7 

45 

4 

10 

37 

5 

14 

35 

6 

19 

•  25 

7 

26 

20 

8 

31 

16 

9 

34 

18 

10 

38 

13 

11 

45 

8 

12 

52 

11 

13 

53 

8 

14 

60 

4 

15 

65 

6 

470 


/  Nonlinear  regression 


It  is  now  desired  to  estimate  the  regression  parameters  y0  and  y1  for  this  model 
and  to  study  the  fit  of  the  model. 

14.3  LEAST  SQUARES  ESTIMATION  IN  NONLINEAR  REGRESSION 

We  noted  in  Chapter  2  that  the  method  of  least  squares  requires  the  minimiza¬ 
tion  of  the  criterion  Q  which  for  simple  linear  regression  is: 

Q  =  PM2 

i=l 

Those  estimates  of  (30  and  (3X  which  minimize  Q  for  the  given  sample  observa¬ 
tions  ( Xi ,  Yi )  are  the  least  squares  estimates  and  are  denoted  by  b0  and  bx.  We 
also  noted  that  one  can  search  for  the  least  squares  estimates  by  trying  various 
.  values  of  b0  and  bx  and  evaluating  Q  each  time  until  the  minimum  value  of  Q  is 
found.  Alternatively,  one  can  find  analytically  the  least  squares  estimates  by 
differentiating  Q  with  respect  to  (30  and  (31 ,  setting  the  derivatives  equal  to  0,  and 
solving  the  normal  equations. 

The  same  two  basic  approaches  apply  in  nonlinear  regression.  We  shall  first 
consider  the  use  of  the  normal  equations  and  then  the  use  of  direct  search  proce¬ 
dures. 

Normal  equations 

In  linear  regression,  we  represent  an  observation  Yt  as  the  sum  of  the  mean 
response  and  an  error  term: 

(14.11)  Yt  =  E(Yt)  +  £{ 

where: 


Em  =  (30  +  faXn  +  •  •  •  + 

Since  nonlinear  regression  models  take  many  different  forms,  as  we  saw  earlier, 
we  shall  now  simply  indicate  that  E(Yt)  is  a  function  of  p  regression  parameters 
y0,  yi, . . . ,  yp~\  and  the  ith  observations  on  the  X  variables: 

(14.12)  E(Yi)=f(Xi,y) 

where: 


xr 

~  To  “ 

7i 

(14.12a) 

X,  = 

• 

Y  = 

* 

qX  1 

px  1 

Xiq_ 

Yp-1 

We  denote  the  number  of  X  variables  by  q ,  since  the  number  of  X  variables  in 
nonlinear  regression  is  not  directly  related  to  the  number  of  parameters,  unlike 


14.3  Least  squares  estimation  in  nonlinear  regression  /  471 


linear  regression.  Thus,  for  model  (14.10),  we  have: 

f(Xi,  y)  =  ‘y0exp(‘y1X/) 

where  the  two  ( p  =  2)  parameters  are  y0  and  yx  and  X,  consists  of  one  (q  =  1) 
value  X{  for  the  ith  observation. 

With  this  notation,  we  express  a  nonlinear  regression  model  as  follows: 
(14.13)  Yi=f(Xi,y)  +  et 

The  least  squares  criterion  Q  can  then  be  written: 


(14.14) 


i=  1 


The  partial  derivative  of  Q  with  respect  to  yk  is: 
(14.15)  4^  =  2  -2U,  -/(X„  7)1  ■ 

djk  i=\ 


df(Xj,  y) 
dyk 


When  the  p  partial  derivatives  are  each  set  equal  to  0  and  the  parameters  yk  are 
replaced  by  the  least  squares  estimates  gk,  we  obtain  after  some  simplification 
the  p  normal  equations: 


(14.16) 

i=  1 


y) 

dyk 


7=g  ?=1 


df{Xi,  y) 


dyk 


0 


7  =  g 


k  =  0,  1, . . .  ,p  —  1 


where  g  is  the  vector  of  the  least  squares  estimates  gk : 


(14.16a) 


'  £o  ' 
gi 


8  p  — 1_ 


and  the  terms  in  brackets  in  (14. 16)  are  partial  derivatives  with  the  parameters  yk 
replaced  by  the  least  squares  estimates  gk. 

The  normal  equations  (14. 16)  for  nonlinear  regression  models  are  nonlinear  in 
the  parameter  estimates  gk  and  are  usually  difficult  to  solve,  even  in  the  simplest 
of  cases.  Hence,  numerical  procedures  are  ordinarily  required  to  obtain  a  solu¬ 
tion  iteratively.  To  make  things  still  more  difficult,  multiple  solutions  may  be 
present. 


Example.  In  our  severely  injured  patients  example,  we  are  employing  the 
response  function: 


(14.17) 


/(X,-,  y)  =  y0exp(yiX;) 


472  /  Nonlinear  regression 


Hence,  the  partial  derivatives  of /(X,-,  7)  are: 

df(Xh  y) 

(14.18a)  =  exp(yiX,) 

dyo 

(14.18b)  =  yoXfixpiyiXi) 

dyx 

Replacing  yQ  and  yx  by  the  respective  least  squares  estimates  g0  and  gi  in 
(14.17),  (14.18a),  and  (14.18b),  the  normal  equations  (14.16)  therefore  are: 

2y/exp(g1X()  -  Sg0exp(g1X;)exp(giX/)  =  0 
2 Yi g 0X; exp ( g ! X, )  -  2g0exp(g1Xi)goX/exp(g1X/)  =  0 

Upon  simplification,  the  normal  equations  become: 

'2Yiexp(g1Xi)  -  go2exp(2gjX/)  =  0 
2UX,exp(g1Xi)  -  g02XI-exp(2g1X/)  =  0 

These  normal  equations  are  not  linear  in  g0  and  g  L ,  and  no  closed- form  solution 
exists.  Thus,  numerical  methods  will  be  required  to  find  the  least  squares  esti¬ 
mates  iteratively. 

Note 

When  the  error  terms  in  a  nonlinear  regression  model  are  independent  77(0,  a2),  the 
least  squares  estimates  are  the  same  as  the  maximum  likelihood  estimates. 


Gauss-Newton  method 

In  many  nonlinear  regression  problems,  it  is  more  practical  to  find  the  least 
squares  estimates  by  direct  search  procedures  rather  than  by  first  obtaining  the 
normal  equations  and  then  using  numerical  methods  to  solve  these  equations 
iteratively.  The  major  statistical  computer  packages  employ  one  or  more  direct 
search  procedures  for  solving  nonlinear  regression  problems. 

The  Gauss-Newton  method,  also  called  the  linearization  method,  uses  a 
Taylor  series  expansion  to  approximate  the  nonlinear  regression  model  with  lin¬ 
ear  terms  and  then  employs  ordinary  least  squares  to  estimate  the  parameters. 
Iteration  of  these  steps  generally  leads  to  a  solution  to  the  nonlinear  regression 
problem. 

The  Gauss-Newton  method  begins  with  initial  or  starting  values  for  the  re¬ 
gression  parameters  y0>  7i,  •  •  • ,  yp-\-  We  shall  denote  these  by  gb0),  g^0), . . . , 
g^i,  where  the  superscript  in  parentheses  denotes  the  iteration  number.  The 
starting  values  g^0)  may  be  obtained  from  previous  or  related  studies,  theoretical 
expectations,  or  a  preliminary  search  for  parameter  values  that  lead  to  a  compara¬ 
tively  low  criterion  value  Q  in  (14. 14).  We  shall  later  discuss  in  more  detail  the 
choice  of  the  starting  values. 

Once  the  starting  values  for  the  parameters  have  been  obtained,  we  approxi¬ 
mate  the  mean  responses  /(X;,  y)  for  the  n  observations  by  the  linear  terms  in 


14.3  Least  squares  estimation  in  nonlinear  regression  /  473 


the  Taylor  series  expansion  around  the  starting  values  gj(0) .  We  obtain  for  the  z'th 
observation: 


(14.19) 


k=0 


df(Xj,  7) 

djk 


( 7k  ~  gi0)) 

y  =  g(0) 


where: 


(14.19a) 


"g&0)" 

^0) 


is  the  vector  of  the  parameter  starting  values.  The  terms  in  brackets  in  (14. 19)  are 
the  same  partial  derivatives  of  the  regression  function  we  encountered  earlier  in 
the  normal  equations  (14.16),  but  here  they  are  evaluated  at  yk  =  g{0>  for 
k  =  0,  1 1. 

Let  us  now  simplify  the  notation  as  follows: 


(14.20a) 

(14.20b) 

(14.20c) 


/J0)=/(X„g(0>) 
=  y*  -  *jt°> 
dfiXt,  t) 


DT  = 


djk 


7=g(0) 


The  Taylor  approximation  (14.19)  for  the  z'th  observation  mean  response  then 
becomes  in  this  notation: 


/(Xi,  7)  =/S0)  +  P'ZD^(3i°> 

k= 0 

and  an  approximation  to  the  nonlinear  regression  model  (14.13): 

Yi  —  f(Xi,  y)  +  Si 


is: 

(14.21)  Y,  =./(»>  +  2  +  b, 

fc= 0 

When  we  shift  the/-0)  term  to  the  left  and  denote  the  difference  Yl  —  /-0)  by  T|0) , 
we  obtain  a  linear  regression  model  approximation: 

(14.22)  +  et  i  =  1, . . .  ,n 

k= 0 


474  /  Nonlinear  regression 


where: 

(14.22a)  y(0)  =  F._/(0) 

We  shall  represent  this  approximation  in  matrix  form  as  follows: 
(14.23) 
where: 


Y(0)  =*  D(0)P(0)  +  8 


(14.23a) 


y(°)  = 

nX  1 


(14.23b) 


D(0) 

nXp 


(14.23c) 


P(°)  = 
pX  1 


Vi  -/  i0) 

1  .  . 

'-h 

3o 

_ 

D® 

D%- !_ 

1 

So  - 

Note  that  the  approximate  model  (14.23)  is  precisely  in  the  form  of  the  gen¬ 
eral  linear  regression  model  (7.18),  with  the  D  matrix  of  partial  derivatives  now 
playing  the  role  of  the  X  matrix.  We  can  therefore  estimate  the  parameters  3(0) 
by  means  of  the  normal  equations  for  the  ordinary  linear  regression  model  and 
obtain  according  to  (7.21): 

(14.24)  b(0)  =  (D(0)'D(0))-1D(0)'Y(0) 

where  b(0)  is  the  vector  of  the  least  squares  estimated  regression  coefficients.  We 
use  these  least  squares  estimates  to  obtain  revised  estimated  regression  coeffi¬ 
cients  by  means  of  (14.20b): 

gP  =  §i0)  +  H0) 

where  gkV)  denotes  the  revised  estimate  of  yk  at  the  end  of  the  first  iteration.  In 
matrix  form,  we  represent  the  revision  process  as  follows: 

(14.25)  g(1)  =  g(0)  +  b(0) 

At  this  point,  we  can  examine  whether  the  revised  regression  coefficients 
represent  adjustments  in  the  proper  direction.  The  least  squares  criterion  measure 
Q  in  (14. 14)  for  the  starting  regression  coefficients  g(0) ,  to  be  denoted  by  SSE (0) , 
is: 


ss£<°>  =  2  re  -  f(xh  g<°>)]2  =  £  w  -  /S0>): 


i=  1 


i=  1 


(14.26) 


14.3  Least  squares  estimation  in  nonlinear  regression  /  475 


At  the  end  of  the  first  iteration,  the  estimated  regression  coefficients  are  g(1) ,  and 
the  least  squares  criterion  measure,  now  denoted  by  SSEW,  is: 

(14.27)  SSEm  =  JK  -/(X.,g(1>))2  =  j?(Y,  -/P>)2 

i= 1  i=l 

If  the  Gauss-Newton  method  is  working  effectively  in  the  first  iteration,  SSE(1> 
should  be  smaller  than  SSE{0)  since  the  revised  estimated  regression  coefficients 
g(1)  should  be  better  estimates. 

Note  that  the  nonlinear  regression  functions  /(X*,  g(0))  and  /(X,-,  g(1))  are 
used  in  calculating  SSE (0)  and  SSE(l> ,  and  not  the  linear  approximations  from  the 
Taylor  series  expansion. 

The  revised  regression  coefficients  g(1)  are  not,  of  course,  the  least  squares 
estimates  for  the  nonlinear  regression  problem  because  the  fitted  model  (14.23) 
is  only  an  approximation  of  the  nonlinear  model.  The  Gauss-Newton  method 
therefore  repeats  the  procedure  just  described,  with  g(1)  now  as  the  starting 
values.  This  produces  a  new  set  of  revised  estimates,  denoted  by  g(2),  and  a  new 
least  squares  criterion  measure  SSE^2\  The  iterative  process  is  continued  until 
the  difference  between  successive  coefficient  estimates  g(i+1)  —  g(i)  and/or 
the  difference  between  successive  least  squares  criterion  measures  SSE{S+1^  — 
SSE (,s)  become  negligible.  We  shall  denote  the  final  estimates  of  the 
regression  coefficients  simply  by  g  and  the  final  least  squares  criterion  measure, 
which  is  the  error  sum  of  squares,  by  SSE. 

The  Gauss-Newton  method  works  effectively  in  many  nonlinear  regression 
applications.  In  some  instances,  however,  the  method  may  require  numerous 
iterations  before  converging,  and  in  a  few  cases  it  may  not  converge  at  all. 

Example.  In  our  severely  injured  patients  example,  initial  values  of  the 
parameters  y0  and  y\  were  taken  to  be  the  estimates  of  these  parameters  when  the 
logarithmic  transformation  model  (14.5)  of  the  intrinsically  linear  exponential 
model  (14.4)  was  fitted.  These  initial  values  are  go0)  =  56.6646  and  g]0)  = 
—  .03797  (calculations  not  shown).  The  least  squares  criterion  measure  at  this 
stage  requires  evaluation  of  the  nonlinear  regression  function  (14.17)  for  each 
observation,  utilizing  the  starting  parameter  values  g(>0)  and  g]0).  For  instance, 
for  the  first  observation,  for  which  X\  =  2,  we  obtain: 

/(X1,g(0))=/^0)  =  ^0)exp(^0)^i) 

=  (56. 6646)exp[  — .03797(2)] 

=  52.5208 

Since  =  54,  the  deviation  from  the  mean  response  is: 

Y{0)  =  Y1  -f{0)  =  54  -  52.5208  =  1.4792 

We  see  that  the  deviation  F)0)  is  actually  the  residual  for  observation  1  at  the 
initial  fitting  stage  since  f\0)  is  the  estimated  mean  response  when  the  initial 
estimates  g(0)  of  the  parameters  are  employed.  The  stage  0  residuals  for  this  and 
the  other  sample  observations  are  presented  in  Table  14.2  and  constitute  the  Y(0) 
vector. 


476  /  Nonlinear  regression 


The  least  squares  criterion  measure  at  this  initial  stage  then  is  simply  the  sum 
of  the  squared  stage  0  residuals: 

SSE(0)  =  2(F;  -f(i0))2  =  2(F<0))2 

=  ( 1 .4792)2  +  •  ■  •  +  (1.1977)2  =  56.0869 

To  revise  the  initial  values  for  the  parameters,  we  require  the  D(0)  matrix  and 
the  Y(0)  vector.  The  latter  was  already  obtained  in  the  process  of  calculating  the 
least  squares  criterion  measure  at  the  start.  To  obtain  the  D(0)  matrix,  we  need  the 
partial  derivatives  of  the  regression  function  (14.17)  evaluated  at  y  =  g(0).  The 
partial  derivatives  are  given  in  (14.18).  Table  14.2  shows  the  D(0)  matrix  entries 
in  symbolic  form  and  also  the  numerical  values.  To  illustrate  the  calculations  for 
observation  i  =  1,  we  know  from  Table  14.1  that  X,  =  2.  Hence,  evaluating  the 
partial  derivatives  at  g(0),  we  find: 

=  exp(g)0)X1) 

y  =  g(0) 

=  exp[-. 03797(2)]  =  .92687 


Y=g(0) 

=  56.6646(2)exp[-. 03797(2)]  =  105.0416 

TABLE  14.2  Y(0)  and  D(0)  matrices  for  severely  injured  patients  example 


df(Xx,  y) 
dyo 

df(Xu  7) 

<?7i 


Dw 


(0) 

1 


Yi  -  g^expCgi0^) 


y(°)  = 

15xl 


Yis 


-fiOJ 


15 


Tis  -  ^[>0)exp(g(10)X15) 


1.4792 

3.1337 

1.5609 

-1.7624 

1.6996 

-2.5422 

-1.1139 

-1.4629 

2.4172 

-.3871 

-2.2625 

3.1327 

.4259 

-1.8063 

1.1977 


exp(g,0)^i) 

g^^expfgi0^)  " 

.92687 

105.0416' 

.82708 

234.3317 

.76660 

304.0736 

.68407 

387.6236 

.58768 

466.2057 

.48606 

523.3020 

.37261 

548.9603 

= 

.30818 

541.3505 

.27500 

529.8162 

.23625 

508.7088 

.18111 

461.8140 

.13884 

409.0975 

.13367 

401.4294 

.10247 

348.3801 

exp(g(i0)X15) 

g[,0)XI5exp(g(10)Z15)_ 

.08475 

312.1510 

14.3  Least  squares  estimation  in  nonlinear  regression  /  477 


Hence,  g^  =  58.5578  and  g)1^  =  —.03953  are  the  revised  parameter  esti¬ 
mates  at  the  end  of  the  first  iteration.  Note  that  the  estimated  regression  coeffi¬ 
cients  have  been  revised  moderately  from  the  initial  values,  as  can  be  seen  from 
Table  14.3a,  which  presents  the  estimated  regression  coefficients  as  well  as  the 
least  squares  criterion  measures  for  the  first  three  iterations.  Note  also  that  the 
least  squares  criterion  measure  has  been  reduced  in  the  first  iteration. 

While  iteration  1  led  to  moderate  revisions  in  the  estimated  regression  coeffi¬ 
cients  and  a  substantially  better  fit  according  to  the  least  squares  criterion,  Table 
14.3a  indicates  that  iteration  2  resulted  only  in  minor  revisions  of  the  estimated 
regression  coefficients  and  little  improvement  in  the  fit.  Iteration  3  led  to  no 
change  in  either  the  estimates  of  the  coefficients  or  the  least  squares  criterion 
measure. 

Hence,  the  search  procedure  was  terminated  after  three  iterations.  The  final 
regression  coefficient  estimates  therefore  are  go  =  58.6065  and  g!  =  —.03959, 
and  the  fitted  regression  model  is: 

(14.28)  Y  =  (58. 6065)exp(  — .0395920 

The  error  sum  of  squares  for  this  fitted  model  is  SSE  =  49.4593.  Figure  14.2 
presents  a  scatter  plot  of  the  data  and  the  estimated  regression  function. 


478  /  Nonlinear  regression 


TABLE  14.3  Gauss-Newton  method  iterations  to  obtain  nonlinear  least 
squares  estimates — severely  injured  patients  example 


(a)  Estimates  of  Parameters  and  Least  Squares  Criterion  Measure 
Iteration  g0  g  i  SSE 


Rk 


0 

1 

2 

3 


56.6646 

58.5578 

58.6055 

58.6065 


.03797 

.03953 

.03959 

.03959 


(b)  Final  Least  Squares  Estimates 
s(gk)  „„„  49.4593 


56.0869 

49.4638 

49.4593 

49.4593 


0  58.6065 

1  -.03959 


1.472 

.00171 


MSE 


13 


=  3.80456 


(c)  Estimated  Approximate  Variance-Covariance  Matrix 
of  Estimated  Regression  Coefficients 


s2(g)  =MSE(D'J)y1 


3.80456 


5.696E  -  1 
-4.682E  -  4 


-4.682E  -  4 
7.697E  -  7 


2.1672  — 1.781E  —  3 

—  1.781E  —  3  2.928E  —  6 


FIGURE  14.2  Plot  of  data  and  fitted  nonlinear  regression  function — 
severely  injured  patients  example 


Days  Hospitalized 


14.3  Least  squares  estimation  in  nonlinear  regression  /  479 


A  plot  of  the  residuals  against  the  fitted  values  Y  (not  shown)  did  not  suggest 
any  serious  departures  from  the  model  assumptions,  and  thus  exponential  model 
(14. 17)  and  the  estimated  regression  function  (14.28)  were  accepted  for  the  prog¬ 
nosis  analysis  for  severely  injured  patients. 

Comments 

1.  The  choice  of  initial  starting  values  is  very  important  with  the  Gauss-Newton 
method  because  a  poor  choice  may  result  in  slow  convergence,  convergence  to  a  local 
minimum,  or  even  divergence.  Good  starting  values  will  generally  result  in  faster  conver¬ 
gence,  and  if  multiple  minima  exist,  will  lead  to  a  solution  that  is  the  global  minimum 
rather  than  a  local  minimum. 

2.  A  variety  of  methods  are  available  for  obtaining  starting  values  for  the  regression 
parameters.  Often,  experience  can  be  utilized  to  provide  good  starting  values  for  the 
regression  parameters.  Another  possibility  is  to  select  p  representative  observations,  set 
the  regression  function /(X/,  y)  equal  to  Yt  for  each  of  the  p  observations  (thereby  ignor¬ 
ing  the  random  error),  solve  the p  equations  for  the  p  parameters,  and  use  the  solutions  as 
the  starting  values,  provided  they  lead  to  reasonably  good  fits  of  the  observed  data.  Still 
another  possibility  is  to  do  a  grid  search  in  the  parameter  space  by  selecting  in  a  grid 
fashion  various  trial  choices  of  g,  evaluate  the  least  squares  criterion  Q  for  each  of  these 
choices,  and  use  as  the  starting  values  that  g  vector  for  which  Q  is  smallest. 

3.  When  using  the  Gauss-Newton  or  some  other  direct  search  procedure,  it  is  often 
desirable  to  try  other  sets  of  starting  values  after  a  solution  has  been  obtained  to  make  sure 
that  the  same  solution  will  be  found. 

4.  Some  computer  packages  for  nonlinear  regression  require  that  the  user  specify  the 
starting  values  for  the  regression  parameters.  Others  do  a  grid  search  to  obtain  starting 
values. 

5.  Some  nonlinear  regression  computer  programs  using  the  Gauss-Newton  method 
require  the  user  to  input  the  partial  derivatives  of  the  regression  function,  while  others 
numerically  calculate  estimated  partial  derivatives  from  the  regression  function.  In  addi¬ 
tion,  most  nonlinear  computer  programs  have  a  library  of  commonly  used  regression 
functions  which  need  only  be  specified  by  the  user. 

6.  The  Gauss-Newton  method  may  produce  iterations  which  oscillate  widely  or  result 
in  increases  in  the  error  sum  of  squares.  Sometimes,  these  are  only  temporary  but  occa¬ 
sionally  serious  convergence  problems  exist.  Various  modifications  of  the  Gauss-Newton 
method  have  been  suggested  to  improve  its  performance,  such  as  the  Hartley  modification 
(Ref.  14.1). 


Other  direct  search  procedures 

A  number  of  other  direct  search  procedures  besides  the  Gauss-Newton 
method  are  frequently  used.  One  is  the  method  of  steepest  descent.  This  method 
searches  for  the  minimum  least  squares  criterion  measure  Q  by  iteratively  deter¬ 
mining  the  direction  in  which  the  regression  coefficients  g  should  be  changed. 
The  method  of  steepest  descent  is  particularly  effective  when  the  starting  values 
g(0)  are  not  good  and  are  far  from  the  final  values  g. 

The  Marquardt  algorithm  seeks  to  utilize  the  best  features  of  the  Gauss- 


480  /  Nonlinear  regression 


Newton  method  and  the  method  of  steepest  descent,  and  occupies  a  middle 
ground  between  these  two  methods. 

Additional  information  about  direct  search  procedures  can  be  found  in  spe¬ 
cialized  sources,  such  as  References  14.2  and  14.3. 


14.4  INFERENCES  ABOUT  NONLINEAR  REGRESSION  PARAMETERS 


Estimated  variances  and  covariances 

Inferences  about  nonlinear  regression  parameters  require  an  estimate  of  the 
error  term  variance  or2.  This  estimate  is  the  same  as  for  linear  regression: 

SSE  E(F(-  -  Yj)2  2[Fi--/(Xf,  g)]2 

(14.29)  MSE  = - =  — ^ S 

n  —  p  n  —  p  n  —  p 

where  g  is  the  vector  of  the  final  parameter  estimates.  For  nonlinear  regression, 
MSE  is  not  an  unbiased  estimator  of  or2 ,  but  the  bias  is  small  when  the  sample 
size  is  large. 

When  the  error  terms  are  independent  and  normally  distributed  and  the  sample 
size  is  reasonably  large,  the  following  theorem  is  helpful: 

(14.30)  When  the  error  terms  et  are  independent  N{ 0,  a2)  and  the  sample 
size  n  is  reasonably  large,  the  sampling  distribution  of  g  is  ap¬ 
proximately  normal  with: 

E(g)  7 

Thus,  when  the  sample  size  is  large  the  least  squares  estimators  g  for  nonlinear 
regression  are  approximately  normally  distributed  and  unbiased.  An  estimate  of 
the  approximate  variance-covariance  matrix  of  the  regression  coefficients  is: 

(14.31)  s2(g)  =MSE(D'Dyl 

where  D  is  the  matrix  of  partial  derivatives  evaluated  at  the  final  least  squares 
estimates  g,  just  as  D(-0')  in  (14.23a)  is  the  matrix  of  partial  derivatives  evaluated 
at  g(0). 

Note  that  the  estimated  approximate  variance-covariance  matrix  s2(g)  is  of 
exactly  the  same  form  as  the  one  for  linear  regression  in  (7.39),  with  D  again 
playing  the  role  of  the  X  matrix. 


Example.  For  our  severely  injured  patients  example,  we  know  from  Table 
14.3a  that  the  final  error  sum  of  squares  is  SSE  =  49.4593.  Since p  =  2  parame¬ 
ters  are  present  in  regression  model  (14.17),  we  have: 


MSE 


SSE 
n  —  p 


49.4593 

15-2 


3.80456 


Table  14.3b  presents  this  mean  square,  and  Table  14.3c  contains  the  estimated 
variance-covariance  matrix  of  the  regression  coefficients.  The  matrix  (D'D)-1  is 


14.4  Inferences  about  nonlinear  regression  parameters  /  481 


based  on  the  final  regression  coefficient  estimates  g  and  is  shown  without  com¬ 
putational  details. 

We  see  from  Table  14.3c  that  s2(g0 )  =  2.1672  and  =  .000002928. 

The  estimated  standard  deviations  of  the  regression  coefficients  are  given  in 
Table  14.3b. 


Interval  estimation  of  a  single  yk 

When  the  error  terms  in  the  nonlinear  regression  model  (14.13)  are  independ¬ 
ent  and  normally  distributed,  the  following  approximate  result  holds  when  the 
sample  size  is  large: 

(14.32)  Sk~  lk  ~  t(n  -  p)  k  =  0,  1, ...  ,p  —  1 

s(gk) 

Hence,  approximate  1  —  a  confidence  limits  for  any  single  yk  are  the  usual  ones: 

(14.33)  gk±  t(l  -  a!2\n  -  p)s{gk) 

Example.  For  our  severely  injured  patients  example,  it  is  desired  to  estimate 
yi  with  a  95  percent  confidence  interval.  We  require  t(.975;  13)  =  2.160,  and 
find  from  Table  14.3b  that  g i  =  —.03959  and  s(gi)  =  .00171.  Hence: 

-.03959  -  2.160(.00171)  <  yi  <  -.03959  +  2.160(. 00171) 

-.0433  <  yi  <  -  .0359 

Thus,  we  can  conclude  with  95  percent  confidence  that  yi  is  between  -  .0433 
and  —.0359. 

Simultaneous  interval  estimation  of  several  yk 

Approximate  joint  confidence  regions  for  the  regression  parameters  in  nonlin¬ 
ear  regression  can  be  developed,  but  they  are  difficult  to  interpret  except  when 
p  —  1  =  2.  Bonferroni  joint  confidence  intervals,  on  the  other  hand,  are  easy  to 
obtain  and  interpret,  as  in  linear  regression.  If  m  parameters  are  to  be  estimated 
with  family  confidence  coefficient  1  —  a ,  the  joint  Bonferroni  confidence  limits 


are: 

(14.34) 

gk  ±  Bs(gk ) 

where: 

(14.34a) 

B  =  t(  1  —  all m;  n  —  p) 

Example.  In  our  severely  injured  patients  example,  it  is  desired  to  obtain 
simultaneous  interval  estimates  for  y0  and  y{  with  a  90  percent  family  confi¬ 
dence  coefficient.  With  the  Bonferroni  procedure  we  therefore  require  separate 
confidence  intervals  for  the  two  parameters,  each  with  a  95  percent  statement 


482  /  Nonlinear  regression 


confidence  coefficient.  We  have  already  obtained  a  confidence  interval  for  y± 
with  a  95  percent  statement  confidence  coefficient.  A  95  percent  statement  confi¬ 
dence  interval  for  y0  is: 

58.6065  -  2.160(1.472)  <  y0  <  58.6065  +  2.160(1.472) 

55.43  <  y0  <  61.79 

Hence,  the  joint  confidence  intervals  with  family  confidence  coefficient  of  90 
percent  are: 

55.4  <  y0  <  61.8 
-.0433  <  yi  <  -  .0359 


Test  concerning  a  single  yk 

n. 

A  test  concerning  a  single  yk  is  set  up  in  the  usual  fashion.  To  test: 


(14.35a) 


Ho-  Jk  =  7*0 
Ha'.  Jk  ^  TfcO 


where  yfc0  is  the  specified  value  of  yk,  we  may  use  the  usual  t*  test  statistic  when 
n  is  reasonably  large: 


(14.35b) 

and  the  decision  rule: 


t* 


gk  Jka 
s(gk) 


(14.35c) 


If  1 |  <  t(l  —  a/2;  n  —  p),  conclude  H0 
Otherwise  conclude  Ha 


Example.  In  our  severely  injured  patients  example,  we  wish  to  test: 

H0:  y0  =  50 
H^.  y0  ^  50 

The  test  statistic  (14.35b)  here  is: 

58.6065  -  50 

t*  = - -  5.85 

1.472 

For  a  —  .01 ,  we  require  t{. 995;  13)  =  3.012.  Since  1 1*  |  =  5.85  >  3.012,  we 
conclude  Ha,  that  y0  50. 


Test  concerning  several  yk 

When  a  test  is  desired  concerning  several  yk  simultaneously,  we  use  the  same 
approach  as  for  the  general  linear  test,  first  fitting  the  full  model  and  obtaining 
SSE(F) ,  then  fitting  the  reduced  model  and  obtaining  SSE(R),  and  finally  calcu- 


14.5  Learning  curve  example  /  483 


lating  the  same  test  statistic  as  for  linear  regression: 


(14.36) 


f* 


SSE(R )  -  SSE(F) 
dfR  -  dfF 


MSE(F) 


For  large  n,  this  test  statistic  is  distributed  approximately  as  F(dfR  —  dfF,  dfF ) 
when  H0  holds. 


14.5  LEARNING  CURVE  EXAMPLE 

We  shall  now  present  a  second  example  to  provide  an  additional  illustration  of 
the  nonlinear  regression  concepts  developed  in  this  chapter.  An  electronic  prod¬ 
ucts  manufacturer  undertook  the  production  of  a  new  product  in  two  locations 
(location  A:  coded  X\  —  1,  location  B:  coded  Xi  =  0).  Location  B  has  more 
modem  facilities  and  hence  was  expected  to  be  more  efficient  than  location  A, 
even  after  the  initial  learning  period.  An  industrial  engineer  calculated  the  ex¬ 
pected  unit  production  cost  for  a  modem  facility  after  learning  has  occurred. 
Weekly  unit  production  costs  for  each  location  were  then  expressed  as  a  fraction 
of  this  expected  cost.  The  reciprocal  of  this  fraction  is  a  measure  of  relative 
efficiency,  and  this  measure  was  utilized  as  the  efficiency  measure  in  this  study. 

It  is  well  known  that  efficiency  increases  over  time  when  a  new  product  is 
produced,  and  that  the  improvements  eventually  slow  down  and  the  process 
stabilizes.  Hence,  it  was  decided  to  employ  an  exponential  model  with  an  upper 
asymptote  for  expressing  the  relation  between  relative  efficiency  (F)  and  time 
(X2) ,  and  to  incorporate  a  constant  effect  for  the  difference  in  the  two  production 
locations.  The  model  decided  on  was: 

(14.37)  Yi=  y0  +  y^Xn  +  y3exp(y2X/2)  +  £/ 

Here,  y0  is  the  upper  asymptote  for  location  B  as  X2  gets  large,  and  y0  +  y,  is 
the  upper  asymptote  for  location  A.  The  parameters  y2  and  y3  reflect  the  speed 
of  learning,  which  was  expected  to  be  the  same  in  the  two  locations. 

While  weekly  data  on  relative  production  efficiency  for  each  location  were 
available,  we  shall  only  use  observations  for  selected  weeks  during  the  first  90 
weeks  of  production  to  simplify  the  presentation.  The  data  on  location,  week, 
and  relative  efficiency  are  presented  in  Table  14.4.  Note  that  learning  was  rela¬ 
tively  rapid  in  both  locations,  and  that  the  relative  efficiency  in  location  B  toward 
the  end  of  the  90- week  period  even  exceeded  1.0,  i.e.,  the  actual  unit  costs  then 
were  lower  than  the  industrial  engineer’s  expected  unit  cost. 

Model  (14.37)  is  nonlinear  in  the  parameters  y2  and  y3.  Hence,  a  direct 
search  estimation  procedure  was  to  be  employed,  for  which  starting  values  for 
the  parameters  are  needed.  These  were  developed  partly  from  past  experience, 
partly  from  analysis  of  the  data.  Previous  studies  indicated  that  y3  should  be  in 
the  neighborhood  of  —.5,  so  =  —  .5  was  used.  Since  the  difference  in  the 
relative  efficiencies  between  locations  A  and  B  for  a  given  week  tended  to  aver¬ 
age  —  .0459  during  the  90-week  period,  a  starting  value  g)0)  =  —.0459  was 
specified.  The  largest  observed  relative  efficiency  for  location  B  was  1.028,  so 


484  /  Nonlinear  regression 


TABLE  14.4  Data  for  learning  curve  example 


Observation 

i 

Location 

Xn 

Week 

X/2 

Relative  Efficiency 

Y, 

1 

1 

1 

.483 

2 

1 

2 

.539 

3 

1 

3 

.618 

4 

1 

5 

.707 

5 

1 

7 

.762 

6 

1 

10 

.815 

7 

1 

15 

.881 

8 

1 

20 

.919 

9 

1 

30 

.964 

10 

1 

40 

.959 

11 

1 

50 

.968 

12 

1 

60 

.971 

13 

1 

70 

.960 

14 

1 

80 

.967 

15 

1 

90 

.975 

16 

0 

1 

.517 

17 

0 

2 

.598 

18 

0 

3 

.635 

19 

0 

5 

.750 

20 

0 

7 

.811 

21 

0 

10 

.848 

22 

0 

15 

.943 

23 

0 

20 

.971 

24 

0 

30 

1.012  , 

25 

0 

40 

1.015 

26 

0 

50 

1.007 

27 

0 

60 

1.022 

28 

0 

70 

1.028 

29 

0 

80 

1.017 

30 

0 

90 

1.023 

that  a  starting  value  gb0)  —  1.025  was  felt  to  be  reasonable.  Only  a  starting  value 
for  y2  remains  to  be  found.  This  was  chosen  by  selecting  a  typical  relative 
efficiency  observation  in  the  middle  of  the  time  period,  Y2 4  =  1.012,  equating 
it  to  the  response  function  with  X24jl  =  0,  X24,2  =  30,  and  the  previous  starting 
values  for  the  other  regression  coefficients  (thus  ignoring  the  error  term): 

1.012  =  1.025  -  (.5)exp(30y2) 

and  solving  for  y2.  Thereby  the  starting  value  g2y>  =  - .  122  was  obtained.  Tests 
for  several  other  representative  observations  yielded  similar  starting  values,  and 
g2>)  =  —.122  was  therefore  considered  to  be  a  reasonable  initial  value. 

With  the  four  starting  values  gb0)  =  1.025,  g^0)  =  -  .0459,  gb0)  =  —.122, 
and  gb0)  =  —  .5,  a  computer  package  direct  search  program  was  utilized  to  obtain 
the  least  squares  estimates.  The  resulting  least  squares  regression  function  was: 

(14.38)  Y  =  1.0156  -  .04727XJ  -  (,5524)exp(-.1348X2) 

and  the  error  sum  of  squares  was  SSE  =  .00329,  with  30  —  4  =  26  degrees  of 


14.5  Learning  curve  example  /  485 


FIGURE  14.3  Plot  of  data  and  fitted  nonlinear  regression  functions — 
learning  curve  example 


Time  (days) 


freedom.  Figure  14.3  presents  the  scatter  plot  and  the  fitted  regression  functions 
for  the  two  locations.  Residual  plots  did  not  indicate  any  noticeable  departures 
from  the  assumed  model. 

Special  interest  existed  in  the  parameter  y\ ,  reflecting  the  effect  of  location.  A 
95  percent  confidence  interval  is  to  be  constructed.  We  require  f(.975;  26)  = 
2.056.  The  computer  printout  contained  the  estimated  variance-covariance  ma¬ 
trix  s2(g)  from  which  it  was  found  that  s(gi)  =  V. 0000 16885  =  .00411. 
Hence,  the  95  percent  confidence  interval  for  yx  is: 

-.04727  -  2.056(.00411)  <  7l  <  -.04727  +  2.056(. 00411) 

-.0557  <  yi  <  -.0388 

Since  7l  is  seen  to  be  negative,  this  confidence  interval  confirms  that  location  A 
with  its  less  modem  facilities  tends  to  be  less  efficient. 


Note 

When  growth  or  learning  curve  models  are  fitted  to  data  constituting  repeated  observa¬ 
tions  on  the  same  unit,  such  as  efficiency  data  for  the  same  production  unit  at  different 
points  in  time,  the  error  terms  may  be  correlated.  Hence,  in  these  situations  it  is  important 
to  ascertain  whether  or  not  a  model  assuming  uncorrelated  error  terms  is  reasonable.  In  the 
learning  curve  example,  a  plot  of  the  residuals  against  time  order  did  not  suggest  any 
serious  correlations  among  the  error  terms. 


486  /  Nonlinear  regression 


PROBLEMS 

14.1. 


14.2. 


14.3. 


14.4. 


14.5. 


For  each  of  the  following  models,  indicate  whether  it  is  a  linear  regression 
model,  an  intrinsically  linear  model,  or  a  nonlinear  model.  In  the  case  of  an 
intrinsically  linear  model,  state  how  it  can  be  expressed  in  the  form  of  (14.1)  by 
a  suitable  transformation: 

a.  Yt  =  exp(y0  +  J\Xi  +  ef) 

b.  Yi  =  exp(y0  +  y\Xt)  +  £,- 

c.  Yi  =  70  +  —X;  +  £} 

To 

For  each  of  the  following  models,  indicate  whether  it  is  a  linear  regression 
model,  an  intrinsically  linear  model,  or  a  nonlinear  model.  In  the  case  of  an 
intrinsically  linear  model,  state  how  it  can  be  expressed  in  the  form  of  (14. 1)  by 
a  suitable  transformation: 

a.  loge  Yt  -  7o  +  7i  loge  Xt  +  et 

b.  Yt  =  yoXliXftei 

c.  Yi  =  7o  -  7i  72'  +  £, 

a.  Plot  the  logistic  response  function: 

300 

E(Y)  = -  X>0 

1  +  (30)  exp(-1.5X) 

b.  What  is  the  asymptote  of  this  response  function?  For  what  value  of  X  does 
the  response  function  reach  90  percent  of  its  asymptote? 

a.  Plot  the  exponential  response  function: 

E(Y)  =  49  —  (30)  exp(— 1.1X)  X>0 

b.  What  is  the  asymptote  of  this  response  function?  For  what  value  of  X  does 
the  response  function  reach  95  percent  of  its  asymptote? 

Home  computers.  A  computer  manufacturer  hired  a  market  research  firm  to 
investigate  the  relationship  between  the  likelihood  a  family  will  purchase  a  home 
computer  and  the  price  of  the  home  computer.  The  data  below  are  based  on  a 
survey  of  1 ,000  heads  of  households  who  were  asked  if  they  are  likely  to  pur¬ 
chase  a  home  computer  at  a  given  price.  Ten  prices  (. X ,  in  hundred  dollars)  were 
studied,  and  100  heads  randomly  selected  were  assigned  to  a  given  price.  The 
proportion  likely  to  purchase  at  a  given  price  is  denoted  by  Y. 

i:  1  2  3  4  5  6  7  8  9  10 

X, \  1  2.5  5  10  20  30  40  50  75  100 

Y, \  .95  .85  .58  .46  .31  .28  .19  .11  .06  .03 

The  following  exponential  model  with  independent  normal  error  terms  was 
deemed  to  be  appropriate: 

Yi=  7o  +  y2exp(-y1X;)  +  et 

a.  To  obtain  initial  estimates  of  y0,  y},  and  72,  note  that  Y  approaches  a  lower 
asymptote  70  as  X  increases  without  bound.  Hence,  let  gi)0)  =  0  and  observe 
that  when  we  ignore  the  error  term  a  logarithmic  transformation  then  yields 
Y\  =  j8o  +  Pi  Xu  where  Y\  =  logc  Yi}  =  loge  72,  and  /3i  =  -7,.  There- 


Problems  /  487 


fore,  fit  a  linear  regression  function  based  on  the  transformed  data  and  use  as 
initial  estimates  g(>0)  -  0,  g\0)  =  —b\,  and  g*>0)  =  exp(&0). 

b.  Using  the  starting  values  obtained  in  part  (a),  find  the  least  squares  estimates 
of  the  parameters  y0,  7i>  and  y2. 

c.  Assume  that  the  number  of  observations  is  reasonably  large.  Obtain  approx¬ 
imate  joint  confidence  intervals  for  the  parameters  y0,  yi ,  and  y2  using  the 
Bonferroni  procedure  and  a  90  percent  family  confidence  coefficient. 

14.6.  Refer  to  Home  computers  Problem  14.5. 

a.  Plot  the  estimated  nonlinear  regression  function  and  the  data.  Does  the  fit 
appear  to  be  adequate? 

b.  Obtain  the  residuals  and  plot  them  against  the  fitted  values  and  against  X  on 
separate  graphs.  Also  obtain  a  normal  probability  plot.  Does  the  model 
appear  to  be  adequate? 

14.7.  Enzyme  kinetics.  In  an  enzyme  kinetics  study  the  velocity  of  a  reaction  (T)  is 

expected  to  be  related  to  the  concentration  (X)  as  follows: 


yqXj 

7i  +  Xt 


£i 


Eleven  concentrations  have  been  studied  and  the  results  follow: 


i:  1  2  3  4  5  6  7  8  9  10  11 

X;:  1  2  3  4  5  7.5  10  15  20  30  40 

Y{.  2.1  4.9  6.5  7.0  8.4  10.2  12.5  14.6  16.1  19.7  23.2 

a.  To  obtain  starting  values  for  y0  and  yx ,  observe  that  when  the  error  term  is 
ignored  we  have  Y\  =  /30  +  jSjX-,  where  Y\  =  1/T„  /30  =  l/y0,  jSx  =  yi/y0, 
and  X[  =  1  /X{.  Therefore  fit  a  linear  regression  function  to  the  transformed 
data  to  obtain  initial  estimates  go0)  =  l/b0  and  g|0)  =  bi/b0. 

b.  Using  the  starting  values  obtained  in  part  (a),  find  the  least  squares  estimates 
of  the  parameters  y0  and  y^ 

c.  Assume  that  the  number  of  observations  is  reasonably  large.  (1)  Obtain  an 
approximate  95  percent  confidence  interval  for  y0.  (2)  Test  whether  or  not 
yi  =  20;  use  a  —  .05.  State  the  alternatives,  decision  rule,  and  conclusion. 

14.8.  Refer  to  Enzyme  kinetics  Problem  14.7. 

a.  Plot  the  estimated  nonlinear  regression  function  and  the  data.  Does  the  fit 
appear  to  be  adequate? 

b.  Obtain  the  residuals  and  plot  them  against  the  fitted  values  and  against  X  on 
separate  graphs.  Also  obtain  a  normal  probability  plot.  What  do  your  plots 
show? 

14.9.  Drug  responsiveness.  A  pharmacologist  modeled  the  responsiveness  to  a  drug 

using  the  following  nonlinear  regression  model: 


Yi 


Jo 


7o 


+  Si 


X  denotes  the  dose  level,  in  coded  form,  and  Y  the  responsiveness  expressed  as 
a  percent  of  the  maximum  possible  responsiveness.  In  the  model,  yo  is  the 
expected  response  at  saturation,  y2  is  the  concentration  that  produces  a  half 


488  /  Nonlinear  regression 


maximal  response,  and  yt  is  related  to  the  slope.  The  data  for  nine  dose  levels 
follow. 


/:  1  2  3  4  5  6  7  8  9 

Xf.  1  2  3  4  5  6  7  8  9 

Yf.  .5  2.3  3.4  24.0  54.7  82.1  94.8  96.2  96.4 

a.  Obtain  least  squares  estimates  of  the  parameters  y0,  y\ ,  and  y2  using  start¬ 
ing  values  =  100,  g ^0)  =  5,  and  gt>0)  =  4.8. 

b.  Assume  that  the  number  of  observations  is  reasonably  large.  Obtain  approx¬ 
imate  joint  confidence  intervals  for  the  parameters  y0,  yi,  and  y2  using  the 
Bonferroni  procedure  with  a  91  percent  family  confidence  coefficient.  Inter¬ 
pret  your  results. 

14.10.  Refer  to  Drug  responsiveness  Problem  14.9. 

a.  Plot  the  estimated  nonlinear  regression  function  and  the  data.  Does  the  fit 
appear  to  be  adequate? 

b.  Obtain  the  residuals  and  plot  them  against  the  fitted  values  and  against  X  on 
separate  graphs.  Also  obtain  a  normal  probability  plot.  What  do  your  plots 
show  about  the  adequacy  of  the  regression  model? 

14.11.  Process  yield.  The  yield  (T)  of  a  chemical  process  depends  on  the  temperature 
(AO  and  pressure  (X2).  The  following  nonlinear  regression  model  is  expected  to 
be  applicable: 

Yi  =  yoX/i'X/i  +  £j 

Prior  to  beginning  full-scale  production,  18  tests  were  undertaken  to  study  the 
process  yield  for  various  temperature  and  pressure  combinations.  The  results 
follow. 


i\ 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Xn: 

1 

10 

100 

1 

10 

100 

1 

10 

100 

X n- 

1 

1 

1 

10 

10 

10 

100 

100 

100 

Yf. 

12 

32 

103 

20 

61 

198 

38 

133 

406 

i : 

10 

11 

12 

13 

14 

15 

16 

17 

18 

An: 

1 

10 

100 

1 

10 

100 

1 

10 

100 

A;2: 

1 

1 

1 

10 

10 

10 

100 

100 

100 

Yf. 

8 

38 

98 

14 

56 

205 

43 

128 

398 

a.  To  obtain  starting  values  for  y0,  yi}  and  y2,  note  that  when  we  ignore  the 
random  error  term,  a  logarithmic  transformation  yields  Y[  =  j80  + 
Pi XU  +  foX'a,  where  Y'i  =  log10  Yu  j80  =  log10  y0,  Pi  =  Ji,  X'n  = 
log10An,  /32  =  y2,  and  A'2  =  log10  Xi2.  Fit  an  ordinary  first-order  multiple 
regression  model  to  the  transformed  data  and  use  as  starting  values  go0)  = 
antilog  10  bo  >  g\0)  =  bu  and  gf}  =  b2. 

b.  Using  the  starting  values  obtained  in  part  (a),  find  the  least  squares  estimates 
of  the  parameters  y0,  yx,  and  y2. 

14.12.  Refer  to  Process  yield  Problem  14.11.  Assume  that  the  number  of  observations 
is  reasonably  large  so  that  large-sample  theory  is  applicable, 
a.  Test  the  hypotheses  H0 :  y\  =  y2  against  Ha.  yx  #  y2  using  the  .05  level  of 
significance.  State  the  alternatives,  decision  rule,  and  conclusion. 


Exercises  /  489 


b.  Obtain  approximate  joint  confidence  intervals  for  the  parameters  y\  and  y2 
using  the  Bonferroni  procedure  and  a  95  percent  family  confidence  coeffi¬ 
cient. 

c.  What  do  you  conclude  about  the  parameters  j\  and  y2  based  on  the  results  in 
parts  (a)  and  (b)? 

14.13.  Refer  to  Process  yield  Problem  14.11. 

a.  Plot  the  estimated  nonlinear  regression  function  and  the  data.  Does  the  fit 
appear  to  be  adequate? 

b.  Obtain  the  residuals  and  plot  them  against  Y,  Xx ,  and  X2  on  separate  graphs. 
Also  obtain  a  normal  probability  plot.  What  do  your  plots  show  about  the 
adequacy  of  the  model? 

14.14.  Refer  to  Process  yield  Problem  14.11.  Conduct  a  formal  approximate  test  for 
lack  of  fit  of  the  nonlinear  regression  function.  Use  a  =  .05  and  assume  that  the 
number  of  observations  is  reasonably  large.  State  the  alternatives,  decision  rule, 
and  conclusion. 


EXERCISES 


14.15.  (Calculus  needed.)  Refer  to  Home  computers  Problem  14.5.  Obtain  the  least 
squares  normal  equations  and  show  that  they  are  nonlinear  in  the  estimated 
regression  coefficients  go,  g\,  and  g2. 

14.16.  (Calculus  needed.)  Refer  to  Enzyme  kinetics  Problem  14.7.  Obtain  the  least 
squares  normal  equations  and  show  that  they  are  nonlinear  in  the  estimated 
regression  coefficients  g0  and  gi . 

14.17.  (Calculus  needed.)  Refer  to  Process  yield  Problem  14.11.  Obtain  the  least 
squares  normal  equations  and  show  that  they  are  nonlinear  in  the  estimated 
regression  coefficients  go,  g i,  and  g2. 

14.18.  Refer  to  Drug  responsiveness  Problem  14.9. 
a.  Assuming  that  E(ei)  —  0,  show  that: 


E{Y)  =  y0 


A 

1  +  A 


where: 


A  =  expty^logg  X  -  loge  y2)]  -  exp(/30  +  Pi*') 

and  f30  =  -yik)ge  y2,  /3i  =  yi,  and  X'  =  logeX. 

b.  Assuming  y0  is  known,  show  that: 

E(Y') 

-t  _E(fj  =  6XP(/3°  +  ^  ^ 

where  Y'  =  Yly0. 

c.  What  transformation  do  these  results  suggest  for  obtaining  a  simple  linear 
regression  function  in  the  transformed  variables? 

d.  How  can  starting  values  for  finding  the  least  squares  estimates  of  the  nonlin¬ 
ear  regression  parameters  be  obtained  from  the  estimates  of  the  linear  re¬ 
gression  coefficients? 


490  /  Nonlinear  regression 


PROJECTS 

14.19.  Refer  to  Enzyme  kinetics  Problem  14.7.  Starting  values  for  finding  the  least 
squares  estimates  of  the  parameters  of  the  nonlinear  regression  model  are  to  be 
obtained  by  a  grid  search.  The  following  bounds  for  the  two  parameters  have 
been  specified: 

5  <  y0  <  65 
5  <  7!  <  65 

Obtain  49  grid  points  by  using  all  possible  combinations  of  the  boundary  values 
and  five  other  equally  spaced  points  for  each  parameter  range.  Evaluate  the  least 
squares  criterion  (14.14)  for  each  grid  point  and  identify  the  point  providing  the 
best  fit.  Does  this  point  give  reasonable  starting  values  here? 

14.20.  Refer  to  Process  yield  Problem  14.11.  Starting  values  for  finding  the  least 
squares  estimates  of  the  nonlinear  regression  model  coefficients  are  to  be  ob¬ 
tained  by  a  grid  search.  The  following  bounds  for  the  parameters  have  been 
postulated: 

1  <  y0  <  21 
.2  <  yx  <  .8 
.1  <  72  <  .7 

Obtain  27  gridpoints  by  using  all  possible  combinations  of  the  boundary  values 
and  the  midpoint  for  each  of  the  parameter  ranges.  Evaluate  the  least  squares 
criterion  (14. 14)  for  each  gridpoint  and  identify  the  point  providing  the  best  fit. 
Does  this  point  give  reasonable  starting  values  here? 


CITED  REFERENCES 

14. 1  Hartley,  H.  O.  ‘  ‘The  Modified  Gauss-Newton  Method  for  the  Fitting  of  Non-linear 
Regression  Functions  by  Least  Squares.”  Technometrics  3  (1961),  pp.  269-80. 

14.2  Gallant,  A.  R.  “Nonlinear  Regression.”  The  American  Statistician  29  (1975),  pp. 
73-81. 

14.3  Kennedy,  W.  J.,  Jr.,  and  J.  E.  Gentle.  Statistical  Computing.  New  York:  Marcel 
Dekker,  1980. 


15 


Normal  correlation  models 


The  purpose  of  this  chapter  is  to  indicate  the  relation  between  regression 
models  and  their  uses,  discussed  in  Chapters  2-14,  and  normal  correlation  mod¬ 
els.  We  first  take  up  bivariate  normal  correlation  models,  and  then  consider 
multivariate  normal  models. 

15.1  DISTINCTION  BETWEEN  REGRESSION  AND  CORRELATION 
MODELS 


As  we  know,  the  basic  regression  models  taken  up  in  this  book  assume  that 
the  independent  variables  A] , . . .  ,Xp-i  are  fixed  constants,  and  primary  interest 
exists  in  making  inferences  about  the  dependent  variable  Y  on  the  basis  of  the 
independent  variables. 

We  saw  in  Chapter  3,  for  the  case  of  a  single  independent  variable,  that  the 
regression  analysis  for  a  normal  error  regression  model  is  applicable  even  when 
A  is  a  random  variable,  provided  that  the  conditional  distributions  of  Y  follow 
certain  specifications  and  the  marginal  distribution  of  X  does  not  involve  the 
regression  model  parameters  (30 ,  /31;  and  or2.  Thus,  in  the  case  where  A  is  a 
random  variable,  only  the  conditional  distributions  of  Y  were  specified,  and  a 
restriction  was  placed  on  the  marginal  distribution  of  A.  We  did  not,  however, 
seek  to  completely  specify  the  joint  distribution  of  A  and  Y.  While  the  discussion 

491 


492  /  Normal  correlation  models 


in  Chapter  3  dealt  with  only  a  single  independent  variable,  all  of  the  points  apply 
to  multiple  regression  models  containing  a  number  of  independent  variables 
which  are  random. 

Correlation  models,  like  regression  models  with  random  independent  varia¬ 
bles,  consist  of  variables  all  of  which  are  random.  Correlation  models  differ  from 
regression  models  by  specifying  the  joint  distribution  of  the  variables  com¬ 
pletely.  Furthermore,  the  variables  in  a  correlation  model  play  a  symmetrical 
role,  with  no  one  variable  automatically  designated  as  the  dependent  variable. 
Correlation  models  are  employed  to  study  the  nature  of  the  relations  between  the 
variables,  and  also  may  be  used  for  making  inferences  about  any  one  of  the 
variables  on  the  basis  of  the  others. 

Thus,  an  analyst  may  use  a  correlation  model  for  the  two  variables  “height  of 
person’  ’  and  ‘  ‘weight  of  person’  ’  in  a  study  of  a  sample  of  persons,  each  variable 
being  taken  as  random.  He  or  she  might  wish  to  study  the  relation  between  the 
two  variables  or  might  be  interested  in  making  inferences  about  weight  of  a 
person  on  the  basis  of  the  person’s  height,  in  making  inferences  about  height  on 
the  basis  of  weight,  or  in  both. 

Other  examples  where  a  correlation  model  may  be  appropriate  are: 

1.  To  study  the  relations  between  service  station  sales  of  gasoline,  auxiliary 
products,  and  repair  services. 

2.  To  study  the  relation  between  company  net  income  determined  by  generally 
accepted  accounting  principles  and  net  income  according  to  tax  regulations. 

3.  To  study  the  relations  between  a  person’s  blood  pressure,  body  temperature, 
and  weight. 

The  correlation  model  most  widely  employed  is  the  normal  correlation  model. 
We  discuss  it  now  for  the  case  of  two  variables. 

15.2  BIVARIATE  NORMAL  DISTRIBUTION 

The  normal  correlation  model  for  the  case  of  two  variables  is  based  on  the 
bivariate  normal  distribution.  Let  us  denote  the  two  variables  as  Y\  and  Y2.  (We 
do  not  use  the  notation  X  and  Y  in  this  chapter  because  both  variables  play  a 
symmetrical  role  in  correlation  analysis.)  We  say  that  fj  and  Y2  are  jointly  nor¬ 
mally  distributed  if  their  joint  probability  distribution  is  the  bivariate  normal 
distribution. 


Density  function 

The  density  function  for  the  bivariate  normal  distribution  is  as  follows: 


(15.1)  f(Yl,Y2)  = 


1 


1 


exp 


2  77CT  x  CT  2  Vl  ~  P|  2 

Yi-fii 


2pi2 


2(1  -  p\2) 
Y2  ~  P'2 


1  ~  Pi 


CT, 


+ 


Y> 


p2 


Cr2 


<r  l 


0-2 


15.2  Bivariate  normal  distribution  /  493 


Note  that  this  density  function  involves  five  parameters:  fii,  jx2,  <ti,  a2,  Pn-  We 
shall  explain  the  meaning  of  these  parameters  shortly.  First,  let  us  consider  a 
graphic  representation  of  the  bivariate  normal  distribution. 


Graphic  representation 

Figure  15.1  contains  a  graphic  representation  of  a  bivariate  normal  distribu¬ 
tion.  It  is  a  surface  in  three-dimensional  space.  For  every  pair  of  (Fi ,  F2)  values, 
there  is  a  density/(Fi,F2)  represented  by  the  height  of  the  surface  at  that  point. 
The  surface  is  continuous,  and  probability  corresponds  to  volume  under  the 
surface. 

FIGURE  15.1  Example  of  bivariate  normal  distribution 


/OF  y2) 


Marginal  distributions 


If  Fi  and  F2  are  jointly  normally  distributed,  it  can  be  shown  that  their  mar¬ 
ginal  distributions  have  the  following  characteristics: 


(15.2a) 


(15.2b) 


The  marginal  distribution  of  F,  is  normal  with  mean  /x,  and  stand¬ 
ard  deviation  ov 


MYt)  = 


1 

V277  O’! 


exp 


The  marginal  distribution  of  F2  is  normal  with  mean  pc2  and  stand¬ 
ard  deviation  cr2: 


/2(F2)  = 


V2 


77  cr2 


-exp 


494  /  Normal  correlation  models 


Thus,  when  Y\  and  Y2  are  jointly  normally  distributed,  each  of  the  two  varia¬ 
bles  by  itself  is  normally  distributed.  It  is  not  generally  true,  however,  that  if  fj 
and  Y2  are  each  normally  distributed,  they  must  be  jointly  normally  distributed  in 
accord  with  (15.1). 

Meaning  of  parameters 

The  five  parameters  of  the  bivariate  normal  density  function  (15.1)  have  the 
following  meaning: 

1.  /xj  and  <Ti  are,  respectively,  the  mean  and  standard  deviation  of  the  mar¬ 
ginal  distribution  of  Y\. 

2.  jx2  and  cr2  are,  respectively,  the  mean  and  standard  deviation  of  the  mar¬ 
ginal  distribution  of  Y2. 

3.  Pi  2  is  the  coefficient  of  correlation  between  the  random  variables  Y\  and 
Y2.  It  is  defined  as  follows: 

cr  it 

(15.3)  P12  = 

o-!  cr2 

where  cr12  is  the  covariance  between  Fi  and  Y2,  as  defined  in  (1.19): 

(15.4)  o-12  =  £[(71-/u1)(F2-/u2)] 

If  Ft  and  F2  are  independent,  cr12  =  0  according  to  (1.23)  so  that  p12  =  0  then.  If 
Fi  and  F2  are  positively  related — i.e.,  Fi  tends  to  be  large  when  F2  is  large,  and 
small  when  F2  is  small — cr12  is  positive  and  so  is  p12.  On  the  other  hand,  if  Fx 
and  F2  are  negatively  related — i.e.,  Fi  tends  to  be  large  when  F2  is  small,  and 
vice  versa — cr  12  is  negative  and  so  is  p12.  The  coefficient  of  correlation  p12  is  a 
pure  number,  and  can  take  on  any  value  between  —1  and  +1  inclusive.  It  as¬ 
sumes  + 1  if  Yl  and  F2  are  perfectly  positively  related  in  a  linear  fashion,  and  —  1 
if  the  perfect  linear  relation  is  a  negative  one. 

Contour  representation 

Bivariate  normal  distributions  frequently  are  portrayed  in  terms  of  a  contour 
diagram.  A  contour  curve  on  such  a  diagram  is  composed  of  all  the  points  on  the 
surface  that  are  equidistant  from  the  Y{Y2  plane.  To  put  this  another  way,  a 
contour  curve  is  composed  of  all  (Fi ,  F2)  outcomes  which  have  constant  density 
f(Yi ,  F2) .  Thus  we  can  picture  a  contour  as  the  cross  section  obtained  by  slicing  a 
bivariate  normal  surface  horizontally  at  a  fixed  distance  above  the  YXY2  plane,  as 
in  Figure  15.2. 

Figure  15.3  presents  a  contour  diagram  for  the  bivariate  normal  surface  of 
Figure  15.1.  It  is  a  property  of  the  bivariate  normal  distribution  that  all  contour 
curves  are  ellipses  except  when  pi2  =  0  and  crx  =  cr2.  Note  that  the  ellipses  have 
a  common  center  at  (/xl5  p.2),  and  have  common  major  and  minor  axes.  Also 
note  that  the  higher  the  horizontal  cross  section  of  the  surface  is  above  the  Fi  F2 
plane,  the  smaller  is  the  corresponding  contour  ellipse. 


496  /  Normal  correlation  models 


Figure  15.4  illustrates  the  effects  of  different  parameter  values  on  the  location 
and  shape  of  the  bivariate  normal  surface.  Note  that  when  Y]  and  Y2  are  posi¬ 
tively  related  so  that  p12  >  0,  the  principal  axis  has  positive  slope,  implying  that 
the  surface  tends  to  mn  along  a  line  with  positive  slope.  When  Fj  and  Y2  are 
negatively  related  so  that  p12  <  0,  the  principal  axis  has  a  negative  slope,  imply¬ 
ing  that  the  surface  tends  to  mn  along  a  line  with  negative  slope. 

Figure  15.4  also  demonstrates  how  the  mean  values  p,j  and  /jl2  affect  the 
location  of  the  surface,  and  how  the  standard  deviations  cr *  and  cr2,  together  with 
the  correlation  coefficient  p12,  affect  the  shape  of  the  surface. 


15.3  CONDITIONAL  INFERENCES 

As  noted,  one  principal  use  of  bivariate  correlation  models  is  to  make  condi¬ 
tional  inferences  regarding  one  variable,  given  the  other  variable.  Suppose  Y\ 
represents  a  service  station’s  gasoline  sales  and  Y2  its  sales  of  auxiliary  products 
and  services.  We  may  then  wish  to  predict  a  service  station’s  sales  of  auxiliary 
products  and  services  Y2,  given  that  its  gasoline  sales  are  Y{  —  $5,500. 

Such  conditional  inferences  require  the  use  of  conditional  probability  distribu¬ 
tions,  which  we  discuss  next. 


Conditional  probability  distributions  of  Fi 

The  conditional  density  function  of  Yx  for  any  given  value  of  Y2  is  denoted  by 
f(Yi  \  Y2)  and  defined  as  follows: 

i  /(Fi ,  Y2) 

(15.5)  f(Yi  \  Y2)  =  JK/7v  y~ 

j2\Y2) 

where  f(Yx ,  Y2)  is  the  joint  density  function  of  F,  and  F2,  and  /2(F2)  is  the 
marginal  density  function  of  F2.  When  Fx  and  F2  are  jointly  normally  distributed 
according  to  (15.1)  so  that  the  marginal  density  function  f2(Y2)  is  given  by 
(15.2b),  it  can  be  shown  that: 


(15.6) 


The  conditional  probability  distribution  of  Fi  for  any  given  value  of 
F2  is  normal  with  mean  ai2  +  yS12F2  and  standard  deviation  cr]  2: 


W\y2) 


77  0~  i  2 


-exp 


1  (  Y)  a\_2  fii2Y2 

°"l.2 


The  parameters  aq  2,  /S12,  and  cr12  of  the  conditional  probability  distributions 
of  Yi  are  functions  of  the  parameters  of  the  joint  probability  distribution  (15.1), 
as  follows: 


(15.7a) 


a1.2  ~  AM  f-^2Pl2 

(T2 


cr  i 

fil2  ~  P 12 


(15.7b) 


O'  2 


15.3  Conditional  inferences  /  49' 


498  /  Normal  correlation  models 


(15.7c)  0-1.2  =  o-i(l  -  P12) 

Important  characteristics  of  conditional  distributions.  Three  important 
characteristics  of  the  conditional  probability  distributions  of  Yi  are  normality, 
linear  regression,  and  constant  variance.  We  take  up  each  of  these  in  turn. 

1 .  The  conditional  probability  distribution  of  Y1  for  any  given  value  of  Y2  is 
normal.  Imagine  that  we  slice  a  bivariate  normal  distribution  vertically  at  a  given 
value  of  Y2,  say,  at  Yh2.  That  is,  we  slice  it  parallel  to  the  Y}  axis.  This  slicing  is 
shown  in  Figure  15.5.  The  exposed  cross  section  has  the  shape  of  a  normal 
distribution,  and  after  being  scaled  so  that  its  area  is  1,  it  portrays  the  conditional 
probability  distribution  of  Y\ ,  given  that  Y2  =  Yh2. 


FIGURE  15.5  Cross  section  of  bivariate  normal  distribution  at  Yhz 


KYlt  Y2) 


This  property  of  normality  holds  no  matter  what  the  value  Yh2  is.  Thus,  when¬ 
ever  we  slice  the  bivariate  normal  distribution  parallel  to  the  Yi  axis,  we  obtain 
(after  proper  scaling)  a  normal  conditional  probability  distribution. 

2.  The  means  of  the  conditional  probability  distributions  of  Y-i  fall  on  a 
straight  line,  and  hence  are  a  linear  function  of  Y2: 

(15.8)  E(Y,\Y2)  =  a,.2  +  £l2Y2 

Here  a12  is  the  intercept  parameter  and  fil2  the  slope  parameter.  Thus,  the 
relation  between  the  conditional  means  and  Y2  is  given  by  a  linear  regression 
function. 

3.  All  conditional  probability  distributions  of  Y\  have  the  same  standard  devi¬ 
ation  ctj  2.  Thus,  no  matter  where  we  slice  the  bivariate  normal  distribution 
parallel  to  the  Y\  axis,  the  resulting  conditional  probability  distribution  (after 


15.3  Conditional  inferences  /  499 


scaling  to  have  an  area  of  1)  has  the  same  standard  deviation.  Hence,  constant 
variances  characterize  the  conditional  probability  distributions  of  Yx. 

Equivalence  to  normal  error  regression  model.  Suppose  that  we  select  a 
random  sample  of  observations  (7l5  Y2)  from  a  bivariate  normal  population  and 
wish  to  make  conditional  inferences  about  Yx,  given  Y2.  The  preceding  discus¬ 
sion  makes  it  clear  that  the  normal  error  regression  model  (2.25)  is  entirely 
applicable  because: 

1.  The  Y\  observations  are  independent. 

2.  The  Yi  observations  when  Y2  is  considered  given  or  fixed  are  normally 
distributed  with  mean  E(Yi |  Y2)  =  a12  +  /31272  and  constant  variance  cr\  2. 


Conditional  probability  distributions  of  Y2 


The  random  variables  Y\  and  Y2  play  symmetrical  roles  in  the  bivariate  normal 
probability  distribution  (15.1).  Hence,  it  follows: 


(15.9) 


The  conditional  probability  distribution  of  Y2  for  any  given  value  of 
Yi  is  normal  with  mean  a2A  +  fi2\Y\  and  standard  deviation  cr21 : 


f(Y2\Yi) 


V2 


77  (72.1 


-exp 


l/  Y2  -  a2A  -  /321Yj\2 
2  V  cr,.!  / 


The  parameters  a 2.i,  J32i,  and  a2A  of  the  conditional  probability  distributions 
of  Y2  are  functions  of  the  parameters  of  the  joint  probability  distribution  (15.1), 
as  follows: 


cr 2 

(15.10a)  a2A  =  H2~  PiPn - 

cr  1 

(15.10b)  fei  =  p,  2— 

cr  \ 

(15.10c)  alA  =  (j\{\  -  pi2) 

The  parameter  a2  A  is  the  intercept  of  the  line  of  regression  of  Y2  on  Y{ ,  and  the 

parameter  (32\  is  the  slope  of  this  line. 

Again,  we  find  that  the  conditional  correlation  model  of  Y2  for  given  Y\  is  the 
equivalent  of  the  normal  error  regression  model  (2.25). 


Comments 

1 .  The  notation  for  the  parameters  of  the  conditional  correlation  models  departs  some¬ 
what  from  our  previous  notation  for  regression  models.  The  symbol  a  is  now  used  to 
denote  the  regression  intercept.  The  subscript  1.2  to  a  indicates  that  Yx  is  regressed  on  Y2. 
Similarly,  the  subscript  2.1  to  a  indicates  that  Y2  is  regressed  on  Yx.  The  symbol  /312 
indicates  that  it  is  the  slope  in  the  regression  of  Yx  on  Y2,  while  j32X  is  the  slope  in  the 
regression  of  Y2  on  Yx .  Finally,  cr2.i  is  the  standard  deviation  of  the  conditional  probability 


500  /  Normal  correlation  models 


distributions  of  Y2  for  any  given  Yx ,  while  oy2  is  the  standard  deviation  of  the  conditional 
probability  distributions  of  Y{  for  any  given  Y2 .  This  notation  can  be  extended  straightfor¬ 
wardly  for  multivariate  correlation  models. 

2.  Two  distinct  regressions  are  involved  in  a  bivariate  normal  model,  that  of  Yi  on  Y2 
when  Y2  is  fixed  and  that  of  Y2  on  Yx  when  Yx  is  fixed.  In  general,  the  two  regression  lines 
are  not  the  same.  For  instance,  the  two  slopes  f3X2  and  /321  are  the  same  only  if  ay  —  cr2, 
as  can  be  seen  from  (15.7b)  and  (15.10b). 

3.  Figure  15.6  illustrates  the  relation  of  the  two  regression  lines  to  the  contour  ellip¬ 
ses.  Note  that  both  regression  lines  go  through  the  point  (p,]5  fx2).  If  p12  =  0,  the  two 
regression  lines  intersect  at  right  angles.  The  larger  absolutely  is  pi2,  the  more  the  two 
regression  lines  come  together. 


FIGURE  15.6  Illustration  of  relation  between  lines  of  regression  and  contour  ellipses 


Use  of  regression  analysis 

In  view  of  the  equivalence  of  each  of  the  conditional  bivariate  normal  correla¬ 
tion  models  (15.6)  and  (15.9)  with  the  normal  error  regression  model  (2.25),  all 
conditional  inferences  with  these  correlation  models  can  be  made  by  means  of 
the  usual  regression  methods.  Thus,  if  a  researcher  has  data  which  can  be  appro¬ 
priately  described  as  having  been  generated  from  a  bivariate  normal  distribution 
and  wishes  to  make  inferences  about  Y2,  given  a  particular  value  of  Yx ,  the 
ordinary  regression  techniques  would  be  applicable.  Thus,  the  regression  equa¬ 
tion  of  Y2  on  Yi  would  be  estimated  by  means  of  (2. 12) ,  the  slope  of  the  regres¬ 
sion  equation  would  be  estimated  by  means  of  the  interval  estimate  (3.15),  a  new 


15.4  Inferences  on  p1 2  /  501 


observation  Y2,  given  the  value  of  Yx ,  would  be  predicted  by  means  of  (3.35), 
and  so  on.  Computer  regression  packages  can  be  used  in  the  usual  manner.  To 
avoid  notational  problems,  it  may  be  helpful  to  relabel  the  variables  according  to 
regression  usage:  Y  =  Y2,  X  =  7i .  Of  course,  if  conditional  inferences  on  Y\  for 
given  values  of  Y2  are  desired,  the  notation  correspondences  would  be:  Y  =  Yu 
X=Y2. 

Note 

When  obtaining  interval  estimates  for  the  conditional  correlation  models,  the  confi¬ 
dence  coefficient  refers  to  repeated  samples  where  pairs  of  observations  (Yi ,  Y2)  are  ob¬ 
tained  from  the  bivariate  normal  population.  We  noted  a  similar  point  for  regression 
models  where  the  independent  variables  are  random. 

15.4  INFERENCES  ON  p12 

A  principal  use  of  correlation  models  is  to  study  the  relationships  between  the 
variables.  In  a  bivariate  normal  model,  the  parameter  p12  and  its  square,  p22, 
provide  information  about  the  degree  of  relationship  between  the  two  variables 
Yi  and  Y2.  Of  the  two  measures,  p\2  is  the  more  meaningful  one. 


Coefficient  of  determination 


The  square  of  the  coefficient  of  correlation  p12  is  called  the  coefficient  of 
determination.  We  noted  earlier  in  (15.7c)  and  (15.10c)  that: 

(15.11a)  0-1.2  =  cri(l  -  P12) 

(15.11b)  02.1  =  02(1  —  Pi2) 

We  can  rewrite  these  expressions  as  follows: 

9  cr  1  ~  of  2 

(15.12a)  9 

(15.12b) 


Pi  2 


0-1 


P12 


a^-~2 


°2.1 


02 


The  meaning  of  pj2  is  now  clear.  Consider  first  (15.12a).  p22  measures  how 
much  smaller  relatively  is  the  variability  in  any  conditional  distribution  of  Yi ,  for 
a  given  level  of  Y2,  than  is  the  variability  in  the  marginal  distribution  of  Yi .  Thus, 
p22  measures  the  relative  reduction  in  the  variability  of  Y\  associated  with  the  use 
of  the  variable  Y2.  Correspondingly,  (15.12b)  shows  that  p\2  also  measures  the 
relative  reduction  in  the  variability  of  Y2  associated  with  the  use  of  the  variable 

Yi. 

It  can  be  shown  that: 


(15.13)  0  —  p22  —  1 

Pi 2  =  0  if  Y}  and  Y2  are  independent,  so  that  the  variances  of  each  variable  in  the 
conditional  probability  distributions  are  then  no  smaller  than  the  variance  in  the 


502  /  Normal  correlation  models 

marginal  distribution.  p\2  =  1  if  there  is  no  variability  in  the  conditional  proba¬ 
bility  distributions  for  each  variable,  so  that  perfect  predictions  of  either  variable 
can  be  made  from  the  other. 

Note 

The  interpretation  of  p\2  as  measuring  the  relative  reduction  in  the  conditional  vari¬ 
ances  as  compared  with  the  marginal  variance  is  valid  for  the  case  of  a  bivariate  normal 
population,  but  not  for  many  other  bivariate  populations.  Of  course,  the  interpretation 
implies  nothing  in  a  causal  sense. 


Point  estimators  of  p12  and  p\2 

The  maximum  likelihood  estimator  of  p12,  denoted  by  r12,  is  given  by: 

2(Fa  -  YM 2  -  Y2) 


(15.14) 


r  12 


[2(yn  -  y1)2]1,2[S(i'i2  -  y2)2] 


2i  1/2 


This  estimator  is  biased  (unless  p12  =  0  or  1),  but  the  bias  is  small  if  n  is  large. 

The  coefficient  of  determination  p\2  is  estimated  by  the  square  of  the  sample 
coefficient  of  correlation,  ri2. 


Note 

The  maximum  likelihood  estimator  (15. 14)  of  the  population  correlation  coefficient  is 
the  same  as  the  descriptive  coefficient  of  correlation  in  (3.73)  for  the  regression  model. 
With  a  correlation  model,  rl2  is  an  estimator  of  a  parameter.  With  a  regression  model,  in 
contrast,  r12  is  only  a  descriptive  measure  which  reflects  the  proportion  of  the  total  sum  of 
squares  that  is  partitioned  into  the  regression  sum  of  squares.  Another  difference  is  that  in 
the  standard  regression  case  where  the  Xt  are  fixed,  the  magnitude  of  ri2  can  be  arbitrarily 
affected  by  the  spacing  pattern  chosen  for  the  X{.  For  the  bivariate  normal  model,  on  the 
other  hand,  the  magnitude  of  ri2  cannot  be  so  affected  since  neither  variable  is  under  the 
control  of  the  investigator. 


Test  whether  p12  =  0 

When  the  population  is  a  bivariate  normal  one,  it  is  frequently  desired  to  test 
between: 


(15.15) 


Ho  -.  P 12  —  0 
Ha:  p12  *  0 


The  reason  for  interest  in  this  test  is  that  in  the  case  where  7,  and  Y2  are  jointly 
normally  distributed,  p12  =  0  implies  that  Y\  and  Y2  are  independent. 

We  can  use  regression  procedures  for  the  test  since  (15.7b)  implies  that  the 
following  alternatives  are  equivalent  to  those  in  (15.15): 


(15.15a) 


Ho  '.  P12  —  0 
Ha.  pi2  ^  0 


15.4  Inferences  on  p12  /  503 


and  (15.10b)  implies  that  the  following  alternatives  are  also  equivalent  to  the 
ones  in  (15.15): 


(15.15b) 


Ho'-  P21  ~  0 
Ha:  jS2i  ^  0 


It  can  be  shown  that  the  statistics  for  testing  either  (15.15a)  or  (15.15b)  can  be 
expressed  directly  in  terms  of  r  12 : 


(15.16) 


t* 


rl2\/n  -  2 

Vl-r?2 


If  H0  holds,  t*  follows  the  tin  —  2)  distribution.  The  appropriate  decision  rule  to 
control  the  Type  I  error  at  a  is: 


(15.17) 


If  1 |  <  t(l  —  a/2;  n  —  2),  conclude  H0 
If  1 1*  |  >  r(l  —  a/2;  n  —  2),  conclude  Ha 


Test  statistic  (15.16)  is  identical  to  the  regression  t*  test  statistic  (3.17). 


Interval  estimation  of  p12 

Because  the  sampling  distribution  of  r12  is  complicated  when  p12  ^  0,  inter¬ 
val  estimation  of  p\2  is  usually  done  by  means  of  a  transformation. 

z'  transformation.  This  transformation,  due  to  R.  A.  Fisher,  is  as  follows: 

1  /  1  +  r\2 

(15.18)  z'  =  — log*  - - - 

2  \  1  ~  r12 

When  n  is  large  (25  or  more  is  a  useful  rule  of  thumb),  the  distribution  of  z'  is 
approximately  normal  with  mean  and  variance: 

(15.19)  E(z')  =  <T  =  ylog, 

(15.20)  <r2(z')  =  — 

n  —  3 

Note  that  the  transformation  from  n2  to  z'  in  (15. 18)  is  the  same  as  the  relation 
in  (15.19)  between  p12  and  E(z')  =  £.  Also  note  that  the  variance  of  z'  is  a 
known  constant,  depending  only  on  the  sample  size  n. 

Table  A-7  gives  paired  values  for  the  left  and  right  sides  of  (15.18)  and 

(15.19),  thus  eliminating  the  need  for  calculations.  For  instance,  if  r12  or  p12 
equals  .25,  Table  A-7  indicates  that  z'  or  £  equals  .2554,  and  vice  versa.  The 
values  on  the  two  sides  of  the  transformation  always  have  the  same  sign.  Thus,  if 
r  12  or  p12  is  negative,  a  minus  sign  is  attached  to  the  value  in  Table  A-7.  For 
instance,  if  r12  =  —.25,  z'  =  —.2554. 


/  1  +  P12  \ 
l  1  -P12 


504  /  Normal  correlation  models 


Interval  estimate.  Since  z'  is  approximately  normally  distributed  for  large 
n,  it  follows  that  the  standardized  statistic: 


(15.21) 


z'-£ 

a(z') 


is  approximately  a  standard  normal  variable  when  n  is  large.  Therefore,  the 
1  -  a  confidence  limits  for  £  are: 

(15.22)  z'  ±  z(l  -  a/2)cr{z') 

where  z(  1  —  a/2)  is  the  (1  —  a! 2)  100  percentile  of  the  standard  normal  distribu¬ 
tion.  These  percentiles  are  given  in  Table  A-l.  The  1  —  a  confidence  limits  for 
p  12  are  then  obtained  by  transforming  the  limits  on  £  by  means  of  (15.19). 

Comments 


1.  As  usual,  a  confidence  interval  for  p12  can  be  employed  to  test  whether  or  not  p12 
has  a  specified  value — say,  .5 — by  noting  whether  or  not  the  specified  value  falls  within 
the  confidence  limits. 

2.  Confidence  limits  for  pi2  can  be  obtained  by  squaring  the  respective  confidence 
limits  for  pi2. 


Example 

An  economist  investigated  food  purchasing  patterns  by  households  in  a  mid- 
western  city.  He  selected  200  households  with  family  incomes  between  $7,500 
and  $17,500,  and  ascertained  from  each  household,  among  other  things,  the 
proportions  of  the  food  budget  expended  for  beef  and  poultry,  respectively.  He 
expected  these  to  be  negatively  related,  and  wished  to  estimate  the  coefficient  of 
correlation  with  a  95  percent  confidence  interval.  The  economist  had  some  sup¬ 
porting  evidence  which  suggested  that  the  joint  distribution  of  the  two  variables 
does  not  depart  markedly  from  a  bivariate  normal  one. 

The  point  estimate  of  p12  was  r12  =  —.61  (data  and  calculations  not  shown). 
To  obtain  a  95  percent  confidence  interval  estimate,  we  require: 

z'  =  —.7089  when  r12  =  —.61  (from  Table  A-7) 

o-(z')  =  -  =  .07125 

V200  -  3 

z(.975)  =  1.96 

Hence,  the  confidence  interval  for  £,  by  (15.22),  is: 

-.849  =  -.7089  -  1.96(. 07125)  <  £<  -.7089  +  1.96(.07125)  =  -.569 
Using  Table  A-7  to  transform  back  to  p12,  we  obtain: 

-.69  <  pi2  <  -.51 

This  confidence  interval  was  sufficiently  precise  to  be  useful  to  the  economist, 
confirming  the  negative  relation  and  indicating  that  the  degree  of  linear  associa¬ 
tion  is  moderately  high. 


15.5  Multivariate  normal  distribution  /  505 


Caution 

Correlation  models,  like  regression  models,  do  not  express  any  causal  rela¬ 
tions.  Earlier  cautions  about  drawing  conclusions  as  to  causality  from  regression 
findings  apply  equally  to  correlation  studies.  Correlation  findings  can  be  useful 
in  analyzing  causal  relationships,  but  they  do  not  by  themselves  establish  causal 
patterns. 


15.5  MULTIVARIATE  NORMAL  DISTRIBUTION 

The  normal  correlation  model  for  the  case  of  p  variables  Y\ , . . . ,  Yp  is  based 
on  the  multivariate  normal  distribution.  This  distribution  is  an  extension  of  the 
bivariate  normal  distribution,  and  has  corresponding  properties.  In  particular,  if 
Y\, ...  ,Yp  are  jointly  normally  distributed  (i.e.,  they  follow  the  multivariate 
normal  distribution),  the  marginal  probability  distribution  of  each  variable  Yk  is 
normal,  with  mean  jxk  and  standard  deviation  ak. 


Conditional  inferences 

One  major  use  of  multivariate  correlation  models  is  to  make  conditional  infer¬ 
ences  on  one  variable  when  the  other  variables  have  given  values.  It  can  be 
shown  in  the  case  of  the  multivariate  normal  distribution  that: 

(15.23)  The  conditional  probability  distribution  of  Yk  for  any  given  set  of 
values  for  the  other  variables  is  normal  with  mean  given  by  a 
linear  regression  function  and  constant  variance. 

Suppose  7i,  Y2,  Y3,  and  Y4  are  jointly  normally  distributed.  The  conditional 
probability  distribution  of,  say,  Y3 ,  when  Y1 ,  Y2,  and  Y4  are  fixed  at  any  specified 
levels,  has  the  characteristics: 

1.  It  is  normal. 

2.  It  has  a  mean  given  by: 

E{Y3\Y\,  Y2,  Y4)  =  «3.124  +  ^31.24^1  +  /332.14L2  +  ^34.12^4 

3.  It  has  constant  variance  cr3  124. 

The  notation  is  a  straightforward  extension  of  that  for  the  bivariate  correlation 
case.  Thus,  f331  24  denotes  the  regression  coefficient  of  kj  when  T3  is  regressed 
on  Yi,  Y2,  and  Y4. 

It  is  clear  from  the  above  that  the  conditional  multivariate  correlation  models 
are  equivalent  to  the  normal  error  multiple  regression  model  (7.7).  Hence,  in  the 
case  where  the  variables  are  jointly  normally  distributed,  all  inferences  on  one 
variable  conditional  on  the  other  variables  being  fixed  are  carried  out  by  the  usual 
multiple  regression  techniques.  For  instance,  an  interval  estimate  of  a  conditional 
mean  would  be  obtained  by  (7.51),  or  a  prediction  of  a  new  observation  on  the 
variable  of  interest,  given  the  values  of  the  other  variables,  would  be  obtained  by 
(7.55).  To  facilitate  use  of  the  regression  formulas,  it  may  be  helpful  to  relabel 


506  /  Normal  correlation  models 


the  variable  of  interest  Y  and  the  other  variables  X’s.  Computer  multiple  regres¬ 
sion  packages  can  be  used  in  ordinary  fashion. 


Coefficients  of  multiple  correlation  and  determination 

A  major  use  of  multivariate  correlation  models  is  to  study  the  relationships 
between  the  variables.  One  set  of  measures  useful  to  this  end  consists  of  the 
coefficients  of  multiple  determination  and  the  coefficients  of  multiple  correlation. 


Meaning  of  coefficients.  A  coefficient  of  multiple  determination  is  associ¬ 
ated  with  each  variable.  Suppose  that  Y} ,  Y2,  73,  and  74  are  included  in  the 
correlation  model.  The  coefficient  of  multiple  determination  associated  with, 
say,  Y]  is  denoted  by  P1.234,  and  defined  as  follows: 


(15.24) 


2  _ 

Pi. 234  _ 


2 

O'!. 234 


Ol 


where  cr\  234  is  the  variance  of  the  conditional  distributions  of  Y}  when  the  other 
variables  are  fixed.  Thus,  p  1.234  measures  how  much  smaller,  relatively,  is  the 
variability  in  the  conditional  distributions  of  Yi ,  when  the  other  variables  are 
fixed  at  given  values,  than  is  the  variability  in  the  marginal  distribution  of  Y\. 
The  other  coefficients  of  multiple  determination  are  defined  and  interpreted  in 
similar  fashion. 

It  can  be  shown  that  coefficients  of  multiple  determination  take  on  values 
between  0  and  1  inclusive.  Thus: 


(15.25)  0  <  Pl234  ^  1 

Let  us  consider  the  significance  of  the  limiting  values.  If  p  1.234  =  0,  no  reduction 
in  the  variability  of  7i  takes  place  by  considering  the  other  variables.  Equiva¬ 
lently,  p  1.234  =  0  in  the  case  of  the  normal  correlation  model  implies  that  Yy  is 
independent  of  Y2,  73,  and  Y4  so  that  pi2  =  P13  =  P14  =  0. 

At  the  other  extreme,  if  pi.234  =  1,  the  conditional  distributions  of  7] ,  given 
Y2,  73 ,  and  74 ,  have  no  variability  so  that  perfect  predictions  can  be  made  of  7i 
from  knowledge  of  the  other  variables. 

The  positive  square  root  of  a  coefficient  of  multiple  determination  is  the 
coefficient  of  multiple  correlation.  Thus,  for  7i,  we  have  in  our  example: 

(15.26)  Pi. 234  =  Vp  1.234 

Pi. 234  can  be  viewed  as  a  simple  correlation  coefficient,  namely,  between  7i  and 

“  1.234  +  /3l2.34}2  +  13.24E3  +  /3 14,23  E4  • 


Estimation  of  coefficients.  Coefficients  of  multiple  determination  and 
correlation  are  estimated  by  (7.31)  and  (7.34),  respectively,  the  descriptive 
coefficients  in  the  regression  case.  Thus,  to  estimate  pi.234,  we  need  the  total 
sum  of  squares  for  7i ,  denoted  by  SSTO(Yi).  Then  we  require  the  regression  sum 
of  squares  when  7i  is  regressed  on  Y2,  Y3,  and  74,  denoted  by  SSR(Y2,  73,  74). 


15.5  Multivariate  normal  distribution  /  507 


The  estimator,  denoted  by  R  1.234,  is: 


(15.27) 


SSR(Y2,Y3,Y4) 

SSTO(Yi) 


The  positive  square  root  of  R  1.234 ,  denoted  by  R  x  .234 ,  is  the  estimated  coefficient 
of  multiple  correlation  for  Yy  . 


Testing  of  coefficients.  To  test,  say: 


(15.28) 


Ho'-  P  1.234  —  0 
H  a  •'  Pi. 234  ^  0 


we  can  utilize  the  equivalent  test  on  the  regression  coefficients: 


(15.28a) 


Ho'-  P\2.34  ~  ^13.24  —  ^14.23  —  0 

Ha :  not  all  regression  coefficients  are  zero 


It  turns  out  that  the  F*  statistic  for  this  test,  given  in  (7.30b),  can  be  expressed 
directly  in  terms  of  R y  234 ,  as  follows: 

R\  934  W  4 

(15.29)  F*  = 


1  -Rt 


234 


In  general,  the  factor  on  the  right  is  (n  —  q  —  1  )/q  when  there  are  q  predictor 
variables. 

If  H0  holds,  F*  follows  the  F(q,  n  —  q  —  1)  distribution  so  that  the  decision 
rule  to  control  the  Type  I  error  risk  at  a  is  set  up  in  the  usual  fashion: 

If  F*  <F(l  —  a;  q,  n  -  q  -  1),  conclude  H0 
If  F*  >  7^(1  —  a;  q,  n  —  q  —  1),  conclude  FLa 


Coefficients  of  partial  correlation  and  determination 

Meaning  of  coefficients.  Suppose  again  that  four  variables  Y{ ,  Y2,  Y3,  and 
Y 4  are  included  in  a  multivariate  normal  correlation  model.  Consider  now  the 
correlation  betweeen  Yx  and  Y2  in  the  conditional  joint  distribution  when  each  of 
the  variables  Y3  and  Y4  is  fixed  at  a  given  level.  When  all  variables  are  jointly 
normally  distributed,  this  correlation  does  not  depend  on  the  levels  where  Y3  and 
Y 4  are  fixed  and  is  given  by: 


(15.31) 


_  cr12.34 

P12.34  — 

O'  1.34  02.34 


Pi  2. 34  is  called  the  coefficient  of  partial  correlation  between  Y{  and  Y2  when  Y3 
and  Y 4  are  fixed. 

The  square  of  P12.34  is  called  the  coefficient  of  partial  determination  and  is 
denoted  by  p  12.34.  The  following  relation  holds: 


2  2 

o~i  .34  o~i  .234 

0-1.34 


(15.32) 


P12.34  — 


508  /  Normal  correlation  models 


Thus,  p  12.34  measures  how  much  smaller,  relatively,  is  the  variability  in  the 
conditional  distributions  of  fj,  given  Y2,  Y3,  and  Y4,  than  it  is  in  the  conditional 
distributions  of  Yy ,  given  Y3  and  Y4  only. 

The  other  partial  correlation  coefficients  are  defined  and  interpreted  in  similar 
fashion.  For  instance,  p12. 3  measures  the  correlation  between  Y\  and  Y2  when 
only  Y3  is  fixed. 

A  coefficient  of  partial  correlation  p123  is  called  a  first-order  coefficient,  a 
coefficient  p12  34  a  second-order  coefficient,  and  so  on.  All  partial  correlation 
coefficients  measure  the  correlation  between  two  variables;  the  order  of  the  coef¬ 
ficient  simply  indicates  how  many  other  variables  are  fixed  in  the  conditional 
bivariate  distribution. 


Point  estimation  of  coefficients.  Point  estimators  of  the  coefficients  of  par¬ 
tial  determination  and  correlation  are  the  descriptive  regression  measures  en¬ 
countered  earlier;  see,  for  instance,  (8.16).  Thus,  to  estimate  px2 .3 ,  we  regress  Y{ 
on  Y3  and  obtain  the  error  sum  of  squares  SSE(Y3).  Next,  we  regress  Yx  on  Y2  and 
Y3  and  obtain  the  error  sum  of  squares  SSE(Y2,  Y3).  The  estimator  of  p\2  3  then 
is: 


(15.33) 


SSE(Y3)~  SSE(Y2,Y3)  _  SSR(Y2\Y3) 
SSE(Y3 )  SSE(Y3) 


Tests  concerning  coefficients.  Researchers  often  wish  to  test  whether  two 
variables  are  correlated  in  the  conditional  probability  distributions  when  other 
variables  are  fixed.  Thus,  when  four  variables  Yx,  Y2,  Y3,  and  Y  4  are  being 
considered  and  the  correlation  between  Y\  and  Y2,  when  Y3  and  Y4  are  fixed,  is 
of  interest,  the  following  test  may  be  desired  to  see  if  Yx  and  Y2  are  correlated  in 
the  conditional  joint  distributions: 


(15.34) 


H o;  P12.34  —  0 
Ha:  P12.34  ^  0 


Such  tests  concerning  partial  correlation  coefficients  can  be  carried  out  via  the 
test  statistic  (15.16)  for  the  bivariate  case,  with  rx2  replaced  by  the  estimated 
partial  correlation  coefficient  and  n  replaced  by  n  —  q ,  where  q  is  the  number  of 
variables  that  are  held  fixed. 


Interval  estimation  of  coefficients.  Interval  estimates  of  partial  correlation 
coefficients  are  obtained  via  the  z'  transformation  (15.18)  in  identical  fashion  to 
that  for  simple  correlation  coefficients.  Only  the  standard  deviation  cr(z')  need 
be  modified.  It  is,  for  q  variables  being  held  fixed: 

o-(z')  = 


(15.35) 


Vn  —  q  —  3 


15.5  Multivariate  normal  distribution  /  509 


Example 

An  operations  analyst,  wishing  to  study  the  relations  between  three  types  of 
test  scores  made  by  applicants  for  entry-level  clerical  positions  in  a  large  insur¬ 
ance  company,  drew  a  sample  of  250  such  applicants  from  recent  records  and 
ascertained  their  scores.  The  variables  were: 


Yi  verbal  aptitude  test  score 
Y2  reading  aptitude  test  score 
Y3  personal  interview  score 

The  multivariate  normal  model  was  considered  to  be  applicable  for  the  study. 

The  analyst’s  initial  interest  was  in  the  partial  correlation  coefficient  p23.i- 
Since  her  computer  program  did  not  have  an  option  for  calculating  partial  corre¬ 
lation  coefficients  by  a  single  command  (many  programs  have  such  an  option), 
the  analyst  ran  two  separate  regressions,  Y2  on  Yx  and  Y2  on  Yx  and  Y3.  She 
obtained  in  turn  SSE(Yi)  =  6,340  and  SSE(YX ,  Y3)  —  6,086.  The  point  estimate 
of  P23.1  then  was  calculated  corresponding  to  (15.33): 

7  SSEiYi) -SSE(YltY3)  6,340  -  6,086 
roa  i  —  —  —  .040 

SSE{Yi)  6,340 


Hence  r23.\  =  .20.  (The  sign  of  r23A  here  is  positive  because  the  regression 
coefficient  for  Y3  when  Y2  is  regressed  on  Yx  and  Y3  is  positive.)  Desiring  a 
confidence  interval  with  a  95  percent  confidence  coefficient,  the  analyst  re¬ 
quired: 


z'  =  .2027  when  r23A  =  .20 


a(z') 


—r=  =  —r=  =  .06376 
V250  -  1  -  3  V246 


z(.975)  =  1.96 


The  confidence  interval  for  £  by  (15.22)  is: 


.0777  =  .2027  -  1.96(.06376)  <  £<  .2027  +  1.96(.06376)  =  .3277 


In  transforming  from  £  to  p,  the  analyst  obtained: 

.08  <  p23.i  ^  .32 

Thus,  the  coefficient  of  partial  correlation  between  reading  aptitude  score  and 
interview  score,  with  verbal  aptitude  score  fixed,  was  at  bgst  relatively  low  and 
could,  indeed,  be  close  to  zero. 

Next  the  analyst  ascertained  from  a  printout  of  the  regression  of  Y2  on  Y3  that 
r2 3  =  .83.  For  a  95  percent  confidence  interval,  she  then  obtained: 


.79  <  P23  —  .86 


Thus,  P23  turned  out  to  be  substantially  larger  than  p23A. 


510  /  Normal  correlation  models 


The  comparative  magnitudes  of  p23.i  and  p2 3  suggested  to  the  analyst  (and 
further  investigation  verified)  that  in  the  company’s  interviewing  procedure  the 
interview  scores  (F3)  depend  in  good  part  on  verbal  skills.  Also,  the  reading 
aptitude  scores  (F2)  obtained  with  the  test  used  by  the  company  tend  to  be 
heavily  influenced  by  verbal  aptitude.  Thus,  when  verbal  aptitude  is  not  consid¬ 
ered,  the  degree  of  relation  between  Y2  and  F3  is  relatively  high  since  applicants 
with  relatively  good  (poor)  verbal  aptitude  tend  to  score  well  (poorly)  in  both  the 
reading  aptitude  test  and  the  personal  interview.  However,  among  applicants  at 
any  given  level  of  verbal  aptitude  score  the  degree  of  relationship  between  read¬ 
ing  score  and  interview  score  tends  to  be  low. 


PROBLEMS 

15.1.  A  management  trainee  in  a  production  department  wished  to  study  the  relation 
between  weight  of  rough  casting  and  machining  time  to  produce  the  finished 
block.  He  selected  castings  so  that  the  weights  would  be  spaced  equally  apart  in 
the  sample  and  observed  the  corresponding  machining  times.  Would  you  recom¬ 
mend  that  a  regression  or  a  correlation  model  be  used?  Explain. 

15.2.  A  social  scientist  stated:  “The  conditions  for  the  bivariate  and  multivariate  nor¬ 
mal  distributions  are  so  rarely  met  in  my  experience  that  I  feel  much  safer  using  a 
regression  model.”  Comment. 

15.3.  A  student  was  investigating  from  a  large  sample  whether  variables  F,  and  F2 
follow  a  bivariate  normal  distribution.  She  obtained  the  residuals  when  regress¬ 
ing  F]  on  F2 ,  and  also  obtained  the  residuals  when  regressing  F2  on  Fj .  She  then 
prepared  a  normal  probability  plot  for  each  set  of  residuals.  Do  these  two  normal 
probability  plots  provide  sufficient  information  for  finding  whether  the  two  vari¬ 
ables  follow  a  bivariate  normal  distribution?  Explain. 

15.4.  Refer  to  Figures  15.1  and  15.3.  Where  in  the  Y\  F2  plane  is  the  height  of  the 
bivariate  normal  surface  greatest?  How  can  this  point  be  ascertained  from  in¬ 
spection  of  the  density  function  (15.1)? 

15.5.  Plot  a  contour  diagram  for  the  bivariate  normal  distribution  with  parameters 
pii  =  50,  p2  =  100,  <J]  =3,  <r2  —  4,  and  p12  =  .80. 

15.6.  Refer  to  Problem  15.5. 

a.  State  the  characteristics  of  the  marginal  distribution  of  Y\. 

b.  State  the  characteristics  of  the  conditional  distribution  of  F2  when  F,  =55. 

c.  State  the  characteristics  of  the  conditional  distribution  of  Fi  when  F2  =  95. 

15.7.  Refer  to  Problem  15.5. 

a.  Give  the  two  regression  lines  for  this  model. 

b.  Why  are  there  two  regression  lines  for  a  bivariate  normal  distribution  and 
not  just  one?  What  is  the  meaning  of  each? 

c.  Must  /3i2  and  /32i  have  the  same  sign? 

15.8.  Refer  to  Figure  15.6.  For  any  specified  value  Yh2,  can  you  identify  where  the 
conditional  density  /( Fi  |  Yh2)  is  maximized?  How  can  this  result  be  ascertained 
also  from  inspection  of  the  conditional  density  function  (15.6)? 


Problems  /  51 1 


15.9.  a.  Plot  a  contour  diagram  for  the  bivariate  normal  distribution  with  parameters 

/xj  =  14,  p,2  =  350,  —  2,  cr2  =  25,  and  p12  =  .90. 

b.  Give  the  two  regression  lines  for  this  model.  Why  are  there  two  regression 
lines  and  not  just  one? 

15.10.  Refer  to  Figure  15.6.  If  the  two  regression  lines  were  not  labeled,  could  you 
determine  by  inspection  which  line  pertains  to  the  regression  of  Y2  on  Y\  and 
which  to  the  regression  of  Y\  on  F2?  Explain. 

15.11.  Explain  whether  any  of  the  following  would  be  affected  if  the  bivariate  normal 
model  (15.1)  were  employed  instead  of  the  normal  error  regression  model  (3.1) 
with  fixed  levels  of  the  independent  variable:  (1)  point  estimates  of  the  regres¬ 
sion  coefficients,  (2)  confidence  limits  for  the  regression  coefficients,  (3)  inter¬ 
pretation  of  the  confidence  coefficient. 

15.12.  Refer  to  Plastic  hardness  Problem  2. 1 8 .  A  student  was  analyzing  these  data  and 
received  the  following  standard  query  from  the  interactive  regression  and  correl¬ 
ation  computer  package:  CALCULATE  CONFIDENCE  INTERVAL  FOR 
POPULATION  CORRELATION  COEFFICIENT  RHO?  ANSWER  Y  OR  N. 
Would  a  “yes”  response  provide  meaningful  information  here?  Explain. 

15.13.  Property  assessments.  The  observations  that  follow  show  assessed  value  for 
property  tax  purposes  (U ,  in  thousand  dollars)  and  sales  price  (T2,  in  thousand 
dollars)  for  a  sample  of  15  parcels  of  land  for  industrial  development  sold  re¬ 
cently  in  “arm’s-length”  transactions  in  a  tax  district.  Assume  that  bivariate 
normal  model  (15.1)  is  appropriate  here. 


i : 

1 

2 

3 

4 

5 

6 

7 

8 

U,: 

13.9 

16.0 

10.3 

11.8 

16.7 

12.5 

10.0 

11.4 

Yn- 

28.6 

34.7 

21.0 

25.5 

36.8 

24.0 

19.1 

22.5 

i: 

9 

10 

11 

12 

13 

14 

15 

Yn- 

13.9 

12.2 

15.4 

14.8 

14.9 

12.9 

15.8 

Yn: 

28.3 

25.0 

31.1 

29.6 

35.1 

30.0 

36.2 

a.  Plot  the  observations  in  a  scatter  diagram.  Does  the  bivariate  normal  model 
appear  to  be  appropriate  here?  Discuss. 

b.  Calculate  r12.  What  parameter  is  estimated  by  r12?  What  is  the  interpreta¬ 
tion  of  this  parameter? 

c.  Test  whether  or  not  Y\  and  Y2  are  statistically  independent  in  the  population, 
using  test  statistic  (15.16)  and  level  of  significance  .01.  State  the  alterna¬ 
tives,  decision  rule,  and  conclusion. 

d.  To  test  pi2  =  .6  versus  pJ2  #  .6,  would  it  be  appropriate  to  use  test  statistic 
(15.16)? 

15.14.  Contract  profitability.  A  cost  analyst  for  a  drilling  arid  blasting  contractor 
examined  84  contracts  handled  in  the  last  two  years  and  found  that  the  coefficient 
of  correlation  between  value  of  contract  (Tj)  and  profit  contribution  generated  by 
the  contract  (72)  is  r12  —  .61.  Assume  that  the  bivariate  normal  model  (15.1) 
applies. 

a.  Test  whether  or  not  U  and  Y2  are  statistically  independent  in  the  population; 
use  a  =  .05.  State  the  alternatives,  decision  rule,  and  conclusion. 

b.  Estimate  p12  with  a  95  percent  confidence  interval. 


512  /  Normal  correlation  models 


c.  Convert  the  confidence  interval  in  part  (b)  to  a  95  percent  confidence  inter¬ 
val  for  Pj2 .  Interpret  this  interval  estimate. 

15.15.  Bid  preparation.  A  building  construction  consultant  studied  the  relationship 
between  cost  of  bid  preparation  (lb)  and  amount  of  bid  (T2)  for  the  consulting 
firm’s  clients.  In  a  sample  of  103  bids  prepared  by  clients,  r12  =  .87.  Assume 
that  the  bivariate  normal  model  (15.1)  applies. 

a.  Test  whether  or  not  pi2  =  0;  control  the  risk  of  Type  I  error  at .  10.  State  the 
alternatives,  decision  rule,  and  conclusion.  What  would  be  the  implication  if 
P12  =  0? 

b.  Obtain  a  90  percent  confidence  interval  for  p12.  Interpret  this  interval  esti¬ 
mate. 

c.  Convert  the  confidence  interval  in  part  (b)  to  a  90  percent  confidence  inter¬ 
val  for  p?2. 

15.16.  Water  flow.  An  engineer,  desiring  to  estimate  the  coefficient  of  correlation  p12 
between  rate  of  water  flow  at  point  A  in  a  stream  (Fj)  and  concurrent  rate  of  flow 
at  point  B  (T2) ,  obtained  r  J2  =  .83  in  a  sample  of  147  observations.  Assume  that 
the  bivariate  normal  model  (15.1)  is  appropriate. 

a.  Obtain  a  99  percent  confidence  interval  for  p12. 

b.  Convert  the  confidence  interval  in  part  (a)  to  a  99  percent  confidence  inter¬ 
val  for  pf2. 

15.17.  Pharmaceutical  study.  A  marketing  research  analyst  in  a  pharmaceutical  firm 
wishes  to  study  the  relation  between  the  following  quantified  variables  in  the 
target  population: 

Y\  socioeconomic  status  of  homemaker 
Y2  magazine  media  exposure  to  homemaker 
Y3  awareness  of  homemaker  to  firm’s  new  product 

She  believes  it  is  reasonable  to  assume  that  the  variables  follow  a  multivariate 
normal  distribution. 

a.  Explain  what  is  measured  by  each  of  the  following:  (1)  <73  i2,  (2)  cr31  2, 
(3)  oi3. 

b.  Explain  the  meaning  of  each  of  the  following:  (1)  p2.13,  (2)  P2.3,  (3)  pi2.3- 

15.18.  Household  survey.  In  a  random  sample  of  250  households,  the  coefficient  of 
mulitiple  correlation  between  purchases  of  soft  drinks  ( K| )  and  purchases  of 
snack  foods  (T2)  and  purchases  of  dairy  products  (T3)  was  R 1  23  —  .892.  The 
coefficient  of  partial  correlation  ri3.2  was  found  to  be  —.413. 

a.  Test  whether  or  not  pi.23  =  0;  use  a  =  .05.  State  the  alternatives,  decision 
rule,  and  conclusion. 

b.  Test  whether  or  not  pi3  2  =  0;  use  a  =  .05.  State  the  alternatives,  decision 
rule,  and  conclusion. 

c.  Estimate  p13.2  with  a  95  percent  confidence  interval.  Convert  this  confi¬ 
dence  interval  into  a  95  percent  confidence  interval  for  pi32.  Interpret  this 
latter  interval  estimate. 

15.19.  Microcomputer  assembly.  A  consultant  studied  the  relation  between  visual 
acuity  (k^),  manual  dexterity  (F2),  and  tactile  reflex  (F3)  in  persons  who  had 
proven  to  be  proficient  as  assemblers  of  microcomputer  components.  Observa¬ 
tions  on  these  variables  were  obtained  for  a  random  sample  of  30  persons.  Sums 


Exercises  /  513 


of  squares  for  regressions  run  by  the  consultant  follow.  Assume  that  the  multi¬ 
variate  normal  correlation  model  applies. 


Regression 


Sum  of  Squares 

Y\  on  Y2 

U  on  Y3 

Yl  on  Y2 ,  F3 

Y2  on  7, 

Y2  on  Y3 

Regression 

7,436 

66 

7,439 

6,900 

42 

Error 

392 

7,762 

389 

363 

7,221 

Total 

7,828 

7,828 

7,828 

7,263 

7,263 

Regression 


Sum  of  Squares 

F2  on  Tj  ,y3 

Y3  on  Y{ 

Y3  on  Y2 

Y3  on  YuY2 

Regression 

6,901 

46 

32 

64 

Error 

362 

5,327 

5,341 

5,309 

Total 

7,263 

5,373 

5,373 

5,373 

a.  What  does  p?  23  denote  here?  Estimate  this  parameter  by  a  point  estimate. 
What  information  is  provided  by  this  estimate? 

b.  Test  whether  or  not  p\  23  —  0  using  a  level  of  significance  of  .01.  State  the 
alternatives,  decision  rule,  and  conclusion. 

c.  Test  whether  or  not  p12.3  =  0;  use  a  —  .01.  State  the  alternatives,  decision 
rule,  and  conclusion. 

d.  Estimate  p12. 3  with  a  99  percent  confidence  interval.  Interpret  your  interval 
estimate. 

15.20.  Refer  to  Microcomputer  assembly  Problem  15.19.  Obtain  r12,  rL3,  and  r23. 

Also  obtain  r]2.3,  r  |32,  and  r23,i .  Summarize  the  information  provided  by  these 

coefficients. 


EXERCISES 

15.21.  The  random  variables  Y\  and  Y2  follow  the  bivariate  normal  distribution  (15.1). 

Show  that  if  p12  =  0,  Yy  and  Y2  are  independent  random  variables. 

15.22.  (Calculus  needed.) 

a.  Obtain  the  maximum  likelihood  estimators  of  the  parameters  of  the  bivariate 
normal  distribution  (15.1). 

b.  Using  the  results  in  part  (a),  obtain  the  maximum  likelihood  estimators  of 
the  parameters  of  the  conditional  probability  distribution  of  Y\  for  any  value 
of  Y2,  as  given  in  (15.7). 

c.  Show  that  the  maximum  likelihood  estimators  of  a12  and  /3i2  obtained  in 
part  (b)  are  the  same  as  the  least  squares  estimators  (2. 10)  for  the  regression 
coefficients  in  the  simple  linear  regression  model. 

15.23.  Show  that  test  statistics  (3.17)  and  (15.16)  are  equivalent. 

15.24.  Show  that  test  statistic  (7.30b)  for  p  =  4  is  equivalent  to  test  statistic  (15.29). 


514  /  Normal  correlation  models 


15.25.  Show  that  the  ratio  SSR/SSTO  is  the  same  whether  Y\  is  regressed  on  Y2  or  Y2  is 
regressed  on  Fj.  [Hint:  Use  (3.50)  and  (2.10a).] 


PROJECT 


15.26.  Refer  to  Pharmaceutical  study  Problem  15.17.  The  following  observations 
were  obtained  in  a  pilot  study  involving  a  random  sample  of  42  homemakers: 


Home-  Home¬ 
maker  maker 


i 

T/i 

Y,  2 

r,3 

i 

T/i 

Yi2 

I* 

i 

76 

81 

53 

22 

57 

51 

32 

2 

92 

89 

78 

23 

49 

52 

28 

3 

85 

84 

66 

24 

71 

72 

54 

4 

63 

70 

46 

25 

70 

66 

55 

5 

50 

48 

38 

26 

91 

86 

67 

6 

89 

93 

68 

27 

95 

91 

78 

7 

35 

29 

20 

28 

81 

79 

60 

8 

67 

68 

41 

29 

40 

38 

15 

9 

94 

98 

77 

30 

41 

43 

18 

10 

83 

79 

61 

31 

88 

91 

72 

11 

24 

28 

12 

32 

63 

63 

40 

12 

31 

37 

14 

33' 

59 

54 

34 

13 

42 

36 

17 

34 

24 

26 

11 

14 

71 

71 

53 

35 

96 

91 

79 

15 

87 

89 

71 

36 

74 

80 

52 

16 

28 

23 

3 

37 

30 

38 

14 

17 

36 

36 

12 

38 

81 

85 

65 

18 

61 

58 

35 

39 

96 

92 

78 

19 

74 

70 

47 

40 

38 

40 

16 

20 

66 

60 

40 

41 

52 

46 

35 

21 

81 

86 

66 

42 

76 

68 

42 

a.  The  analyst  first  wished  to  test  whether  or  not  p312  =  0.  Perform  the  test 
using  a  level  of  significance  of  .01.  State  the  alternatives,  decision  rule,  and 
conclusion.  What  is  the  implication  for  the  analyst  if  p312  =  0? 

b.  Estimate  each  coefficient  of  simple  correlation  and  each  coefficient  of  par¬ 
tial  correlation  with  a  separate  99  percent  confidence  interval.  What  is  the 
family  confidence  coefficient  for  the  entire  set  of  estimates? 

c.  Analyze  the  results  obtained  in  part  (b)  and  summarize  your  findings. 


Appendix  tables  /  51 7 


TABLE  A-1  Cumulative  probabilities  of  the  standard  normal  distribution 
Entry  is  area  A  under  the  standard  normal  curve  from  —  oo  to  z(A) 


z(A) 


z 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

.0 

.5000 

.5040 

.5080 

.5120 

.5160 

.5199 

.5239 

.5279 

.5319 

.5359 

.1 

.5398 

.5438 

.5478 

.5517 

.5557 

.5596 

.5636 

.5675 

.5714 

.5753 

.2 

.5793 

.5832 

.5871 

.5910 

.5948 

.5987 

.6026 

.6064 

.6103 

.6141 

.3 

.6179 

.6217 

.6255 

.6293 

.6331 

.6368 

.6406 

.6443 

.6480 

.6517 

.4 

.6554 

.6591 

.6628 

.6664 

.6700 

.6736 

.6772 

.6808 

.6844 

.6879 

.5 

.6915 

.6950 

.6985 

.7019 

.7054 

.7088 

.7123 

.7157 

.7190 

.7224 

.6 

.7257 

.7291 

.7324 

.7357 

.7389 

.7422 

.7454 

.7486 

.7517 

.7549 

.7 

.7580 

.7611 

.7642 

.7673 

.7704 

.7734 

.7764 

.7794 

.7823 

.7852 

.8 

.7881 

.7910 

.7939 

.7967 

.7995 

.8023 

.8051 

.8078 

.8106 

.8133 

.9 

.8159 

.8186 

.8212 

.8238 

.8264 

.8289 

.8315 

.8340 

.8365 

.8389 

1.0 

.8413 

.8438 

.8461 

.8485 

.8508 

.8531 

.8554 

.8577 

.8599 

.8621 

1.1 

.8643 

.8665 

.8686 

.8708 

.8729 

.8749 

.8770 

.8790 

.8810 

.8830 

1.2 

.8849 

.8869 

.8888 

.8907 

.8925 

.8944 

.8962 

.8980 

.8997 

.9015 

1.3 

.9032 

.9049 

.9066 

.9082- 

.9099 

.9115 

.9131 

.9147 

.9162 

.9177 

1.4 

.9192 

.9207 

.9222 

.9236 

.9251 

.9265 

.9279 

.9292 

.9306 

.9319 

1.5 

.9332 

.9345 

.9357 

.9370 

.9382 

.9394 

.9406 

.9418 

.9429 

.9441 

1.6 

.9452 

.9463 

.9474 

.9484 

.9495 

.9505 

.9515 

.9525 

.9535 

.9545 

1.7 

.9554 

.9564 

.9573 

.9582 

.9591 

.9599 

.9608 

.9616 

.9625 

.9633 

1.8 

.9641 

.9649 

.9656 

.9664 

.9671 

.9678 

.9686 

.9693 

.9699 

.9706 

1.9 

.9713 

.9719 

.9726 

.9732 

.9738 

.9744 

.9750 

.9756 

.9761 

.9767 

2.0 

.9772 

.9778 

.9783 

.9788 

.9793 

.9798 

.9803 

.9808 

.9812 

.9817 

2.1 

.9821 

.9826 

.9830 

.9834 

.9838 

.9842 

.9846 

.9850 

.9854 

.9857 

2.2 

.9861 

.9864 

.9868 

.9871 

.9875 

.9878 

.9881 

.9884 

.9887 

.9890 

2.3 

.9893 

.9896 

.9898 

.9901 

.9904 

.9906 

.9909 

.9911 

.9913 

.9916 

2.4 

.9918 

.9920 

.9922 

.9925 

.9927 

.9929 

.9931 

.9932 

.9934 

.9936 

2.5 

.9938 

.9940 

.9941 

.9943 

.9945 

.9946 

.9948 

.9949 

.9951 

.9952 

2.6 

.9953 

.9955 

.9956 

.9957 

.9959 

.9960 

.9961 

.9962 

.9963 

.9964 

2.7 

.9965 

.9966 

.9967 

.9968 

.9969 

.9970 

.9971 

.9972 

.9973 

.9974 

2.8 

.9974 

.9975 

.9976 

.9977 

.9977 

.9978 

.9979 

.9979 

.9980 

.9981 

2.9 

.9981 

.9982 

.9982 

.9983 

.9984 

.9984 

.9985 

.9985 

.9986 

.9986 

3.0 

.9987 

.9987 

.9987 

.9988 

.9988 

.9989 

.9989 

.9989 

.9990 

.9990 

3.1 

.9990 

.9991 

.9991 

.9991 

.9992 

.9992 

.9992 

.99*92 

.9993 

.9993 

3.2 

.9993 

.9993 

.9994 

.9994 

.9994 

.9994 

.9994 

.9995 

.9995 

.9995 

3.3 

.9995 

.9995 

.9995 

.9996 

.9996 

.9996 

.9996 

.9996 

.9996 

.9997 

3.4 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9998 

Selected  Percentiles 

Cumulative  probability  A : 

.90 

.95 

|  .975 

.98 

.99 

.995 

.999 

z{A)' 

1.282 

1.645 

1.960 

2.054 

2.326 

2.576 

3.090 

518  /  Appendix  tables 


TABLE  A-2  Percentiles  of  the  t  distribution 

Entry  is  t{A\  v)  where  P{t(v)  <t(A\  »■')!  —  A 


A 


V 

.60 

.70 

.80 

.85 

.90 

.95 

.975 

1 

0.325 

0.727 

1.376 

1.963 

3.078 

6.314 

12.706 

2 

0.289 

0.617 

1.061 

1.386 

1.886 

2.920 

4.303 

3 

0.277 

0.584 

0.978 

1.250 

1.638 

2.353 

3.182 

4 

0.271 

0.569 

0.941 

1.190 

1.533 

2.132 

2.776 

5 

0.267 

0.559 

0.920 

1.156 

1.476 

2.015 

2.571 

6 

0.265 

0.553 

0.906 

1.134 

1.440 

1.943 

2.447 

7 

0.263 

0.549 

0.896 

1.119 

1.415 

1.895 

2.365 

8 

0.262 

0.546 

0.889 

1.108 

1.397 

1.860 

2.306 

9 

0.261 

0.543 

0.883 

1.100 

1.383 

1.833 

2.262 

10 

0.260 

0.542 

0.879 

1.093 

1.372 

1.812 

2.228 

11 

0.260 

0.540 

0.876 

1.088 

1.363 

1.796 

2.201 

12 

0.259 

0.539 

0.873 

1.083 

1.356 

1.782 

2.179 

13 

0.259 

0.537 

0.870 

1.079 

1.350 

1.771 

2.160 

14 

0.258 

0.537 

0.868 

1.076 

1.345 

1.761 

2.145 

15 

0.258 

0.536 

0.866 

1.074 

1.341 

1.753 

2.131 

16 

0.258 

0.535 

0.865 

1.071 

1.337 

1.746 

2.120 

17 

0.257 

0.534 

0.863 

1.069 

1.333 

1.740 

2.110 

18 

0.257 

0.534 

0.862 

1.067 

1.330 

1.734 

2.101 

19 

0.257 

0.533 

0.861 

1.066 

1.328 

1.729 

2.093 

20 

0.257 

0.533 

0.860 

1.064 

1.325 

1.725 

2.086 

21 

0.257 

0.532 

0.859 

1.063 

1.323 

1.721 

2.080 

22 

0.256 

0.532 

0.858 

1.061 

1.321 

1.717 

2.074 

23 

0.256 

0.532 

0.858 

1.060 

1.319 

1.714 

2.069 

24 

0.256 

0.531 

0.857 

1.059 

1.318 

1.711 

2.064 

25 

0.256 

0.531 

0.856 

1.058 

1.316 

1.708 

2.060 

26 

0.256 

0.531 

0.856 

1.058 

1.315 

1.706 

2.056 

27 

0.256 

0.531 

0.855 

1.057 

1.314 

1.703 

2.052 

28 

0.256 

0.530 

0.855 

1.056 

1.313 

1.701 

2.048 

29 

0.256 

0.530 

0.854 

1.055 

1.311 

1.699 

2.045 

30 

0.256 

0.530 

0.854 

1.055 

1.310 

1.697 

2.042 

40 

0.255 

0.529 

0.851 

1.050 

1.303 

1.684 

2.021 

60 

0.254 

0.527 

0.848 

1.045 

1.296 

1.671 

2.000 

120 

0.254 

0.526 

0.845 

1.041 

1.289 

1.658 

1.980 

00 

0.253 

0.524 

0.842 

1.036 

1.282 

1.645 

1.960 

Appendix  tables 


/ 


519 


TABLE  A-2  (concluded) 


Percentiles  of  the  t  distribution 


A 


V 

.98 

.985 

.99 

.9925 

.995 

.9975 

.9995 

1 

15.895 

21.205 

31.821 

42.434 

63.657 

127.322 

636.590 

2 

4.849 

5.643 

6.965 

8.073 

9.925 

14.089 

31.598 

3 

3.482 

3.896 

4.541 

5.047 

5.841 

7.453 

12.924 

4 

2.999 

3.298 

3.747 

4.088 

4.604 

5.598 

8.610 

5 

2.757 

3.003 

3.365 

3.634 

4.032 

4.773 

6.869 

6 

2.612 

2.829 

3.143 

3.372 

3.707 

4.317 

5.959 

7 

2.517 

2.715 

2.998 

3.203 

3.499 

4.029 

5.408 

8 

2.449 

2.634 

2.896 

3.085 

3.355 

3.833 

5.041 

9 

2.398 

2.574 

2.821 

2.998 

3.250 

3.690 

4.781 

10 

2.359 

2.527 

2.764 

2.932 

3.169 

3.581 

4.587 

11 

2.328 

2.491 

2.718 

2.879 

3.106 

3.497 

4.437 

12 

2.303 

2.461 

2.681 

2.836 

3.055 

3.428 

4.318 

13 

2.282 

2.436 

2.650 

2.801 

3.012 

3.372 

4.221 

14 

2.264 

2.415 

2.624 

2.771 

2.977 

3.326 

4.140 

15 

2.249 

2.397 

2.602 

2.746 

2.947 

3.286 

4.073 

16 

2.235 

2.382 

2.583 

2.724 

2.921 

3.252 

4.015 

17 

2.224 

2.368 

2.567 

2.706 

2.898 

3.222 

3.965 

18 

2.214 

2.356 

2.552 

2.689 

2.878 

3.197 

3.922 

19 

2.205 

2.346 

2.539 

2.674 

2.861 

3.174 

3.883 

20 

2.197 

2.336 

2.528 

2.661 

2.845 

3.153 

3.849 

21 

2.189 

2.328 

2.518 

2.649 

2.831 

3.135 

3.819 

22 

2.183 

2.320 

2.508 

2.639 

2.819 

3.119 

3.792 

23 

2.177 

2.313 

2.500 

2.629 

2.807 

3.104 

3.768 

24 

2.172 

2.307 

2.492 

2.620 

2.797 

3.091 

3.745 

25 

2.167 

2.301 

2.485 

2.612 

2.787 

3.078 

3.725 

26 

2.162 

2.296 

2.479 

2.605 

2.779 

3.067 

3.707 

27 

2.158 

2.291 

2.473 

2.598 

2.771 

3.057 

3.690 

28 

2.154 

2.286 

2.467 

2.592 

2.763 

3.047 

3.674 

29 

2.150 

2.282 

2.462 

2.586 

2.756 

3.038 

3.659 

30 

2.147 

2.278 

2.457 

2.581 

2.750 

3.030 

3.646 

40 

2.123 

2.250 

2.423 

2.542 

2.704 

2.971 

3.551 

60 

2.099 

2.223 

2.390 

2.504 

2.660 

2.915 

3.460 

120 

2.076 

2.196 

2.358 

2.468 

2.617 

2.860 

3.373 

00 

2.054 

2.170 

2.326 

2.432 

2.576 

2.807 

3.291 

520  /  Appendix  tables 


TABLE  A-3  Percentiles  of  the  x2  distribution 

Entry  is  x2(/l;  v)  where  P{x2(v)  ^  X2(^!  v)}  =A 


x2(^v) 


A 


V 

.005 

.010 

.025 

.050 

.100 

.900 

.950 

.975 

.990 

.995 

1 

0.04393  0.03157  0.03982  0.02393  0.0158 

2.71 

3.84 

5.02 

6.63 

7.88 

2 

0.0100 

0.0201 

0.0506 

0.103 

0.211 

4.61 

5.99 

7.38 

9.21 

10.60 

3 

0.072 

0.115 

0.216 

0.352 

0.584 

6.25 

7.81 

9.35 

11.34 

12.84 

4 

0.207 

0.297 

0.484 

0.711 

1.064 

7.78 

9.49 

11.14 

13.28 

14.86 

5 

0.412 

0.554 

0.831 

1.145 

1.61 

9.24 

11.07 

12.83 

15.09 

16.75 

6 

0.676 

0.872 

1.24 

1.64 

2.20 

10.64 

12.59 

14.45 

16.81 

18.55 

7 

0.989 

1.24 

1.69 

2.17 

2.83 

12.02 

14.07 

16.01 

18.48 

20.28 

8 

1.34 

1.65 

2.18 

2.73 

3.49 

13.36 

15.51 

17.53 

20.09 

21.96 

9 

1.73 

2.09 

2.70 

3.33 

4.17 

14.68 

16.92 

19.02 

21.67 

23.59 

10 

2.16 

2.56 

3.25 

3.94 

4.87 

15.99 

18.31 

20.48 

23.21 

25.19 

11 

2.60 

3.05 

3.82 

4.57 

5.58 

17.28 

19.68 

21.92 

24.73 

26.76 

12 

3.07 

3.57 

4.40 

5.23 

6.30 

18.55 

21.03 

23.34 

26.22 

28.30 

13 

3.57 

4.11 

5.01 

5.89 

7.04 

19.81 

22.36 

24.74 

27.69 

29.82 

14 

4.07 

4.66 

5.63 

6.57 

7.79 

21.06 

23.68 

26.12 

29.14 

31.32 

15 

4.60 

5.23 

6.26 

7.26 

8.55 

22.31 

25.00 

27.49 

30.58 

32.80 

16 

5.14 

5.81 

6.91 

7.96 

9.31 

23.54 

26.30 

28.85 

32.00 

34.27 

17 

5.70 

6.41 

7.56 

8.67 

10.09 

24.77 

27.59 

30.19 

33.41 

35.72 

18 

6.26 

7.01 

8.23 

9.39 

10.86 

25.99 

28.87 

31.53 

34.81 

37.16 

19 

6.84 

7.63 

8.91 

10.12 

11.65 

27.20 

30.14 

32.85 

36.19 

38.58 

20 

7.43 

8.26 

9.59 

10.85 

12.44 

28.41 

31.41 

34.17 

37.57 

40.00 

21 

8.03 

8.90 

10.28 

11.59 

13.24 

29.62 

32.67 

35.48 

38.93 

41.40 

22 

8.64 

9.54 

10.98 

12.34 

14.04 

30.81 

33.92 

36.78 

40.29 

42.80 

23 

9.26 

10.20 

11.69 

13.09 

14.85 

32.01 

35.17 

38.08 

41.64 

44.18 

24 

9.89 

10.86 

12.40 

13.85 

15.66 

33.20 

36.42 

39.36 

42.98 

45.56 

25 

10.52 

11.52 

13.12 

14.61 

16.47 

34.38 

37.65 

40.65 

44.31 

46.93 

26 

11.16 

12.20 

13.84 

15.38 

17.29 

35.56 

38.89 

41.92 

45.64 

48.29 

27 

11.81 

12.88 

14.57 

16.15 

18.11 

36.74 

40.11 

43.19 

46.96 

49.64 

28 

12.46 

13.56 

15.31 

16.93 

18.94 

37.92 

41.34 

44.46 

48.28 

50.99 

29 

13.12 

14.26 

16.05 

17.71 

19.77 

39.09 

42.56 

45.72 

49.59 

52.34 

30 

13.79 

14.95 

16.79 

18.49 

20.60 

40.26 

43.77 

46.98 

50.89 

53.67 

40 

20.71 

22.16 

24.43 

26.51 

29.05 

51.81 

55.76 

59.34 

63.69 

66.77 

50 

27.99 

29.71 

32.36 

34.76 

37.69 

63.17 

67.50 

71.42 

76.15 

79.49 

60 

35.53 

37.48 

40.48 

43.19 

46.46 

74.40 

79.08 

83.30 

88.38 

91.95 

70 

43.28 

45.44 

48.76 

51.74 

55.33 

85.53 

90.53 

95.02 

100.4 

104.2 

80 

51.17 

53.54 

57.15 

60.39 

64.28 

96.58 

101.9 

106.6 

112.3 

116.3 

90 

59.20 

61.75 

65.65 

69.13 

73.29 

107.6 

113.1 

118.1 

124.1 

128.3 

100 

67.33 

70.06 

74.22 

77.93 

82.36 

118.5 

124.3 

129.6 

135.8 

140.2 

Source:  Reprinted,  with  permission,  from  C.  M.  Thompson,  “Table  of  Percentage  Points  of  the  Chi-Square 
Distribution,”  Biometrika  32  (1941),  pp.  188-89. 


Appendix  tables  /  521 


TABLE  A-4  Percentiles  of  the  F  distribution 

Entry  is  F{A ;  vlt  v2 )  where  P{F(vu  v2 )  <  F(A\  vu  v2 )}  =A 


F( A',  v , ,  v2) 


I 


H  I  -/l;  v2,Vl ) 


522  /  Appendix  tables 


TABLE  A-4  (continued)  Percentiles  of  the  F  distribution 


Den 

df 

! 

A 

1 

2 

3 

Numerator  df 

4  5  6 

7 

8 

9 

1 

.50 

1.00 

1.50 

1.71 

1.82 

1.89 

1.94 

1.98 

2.00 

2.03 

.90 

39.9 

49.5 

53.6 

55.8 

57.2 

58.2 

58.9 

59.4 

59.9 

.95 

161 

200 

216 

225 

230 

234 

237 

239 

241 

.975 

648 

800 

864 

900 

922 

937 

948 

957 

963 

.99 

4,052 

5,000 

5,403 

5,625 

5,764 

5,859 

5,928 

5,981 

6,022 

.995 

16,211 

20,000 

21,615 

22,500 

23,056 

23,437 

23,715 

23,925 

24,091 

.999 

405,280  500,000 

i 

540,380 

562,500 

576,400 

585,940 

592,870 

598,140 

602,280 

2 

.50 

0.667 

1.00 

1.13 

1.21 

1.25 

1.28 

1.30 

1.32 

1.33 

.90 

8.53 

9.00 

9.16 

9.24 

9.29 

9.33 

9.35 

9.37 

9.38 

.95 

18.5 

19.0 

19.2 

19.2 

19.3 

19.3 

19.4 

19.4 

19.4 

.975 

38.5 

39.0 

39.2 

39.2 

39.3 

39.3 

39.4 

39.4 

39.4 

.99 

98.5 

99.0 

99.2 

99.2 

99.3 

99.3 

99.4 

99.4 

99.4 

.995 

199 

199 

199 

199 

199 

199 

199 

199 

199 

.999 

998.5 

999.0 

999.2 

999.2 

999.3 

999.3 

999.4 

999.4 

999.4 

3 

.50 

0.585 

0.881 

1.00 

1.06 

1.10 

1.13 

1.15 

1.16 

1.17 

.90 

5.54 

5.46 

5.39 

5.34 

5.31 

5.28 

5.27 

5.25 

5.24 

.95 

10.1 

9.55 

9.28 

9.12 

9.01 

8.94 

8.89 

8.85 

8.81 

.975 

17.4 

16.0 

15.4 

15.1 

14.9 

14.7 

14.6 

14.5 

14.5 

.99 

34.1 

30.8 

29.5 

28.7 

28.2 

27.9 

27.7 

27.5 

27.3 

.995 

55.6 

49.8 

47.5 

46.2 

45.4 

44.8 

44.4 

44.1 

43.9 

.999 

167.0 

148.5 

141.1 

137.1 

134.6 

132.8 

131.6 

130.6 

129.9 

4 

.50 

0.549 

0.828 

0.941 

1.00 

1.04 

1.06 

1.08 

1.09 

1.10 

.90 

4.54 

4.32 

4.19 

4.11 

4.05 

4.01 

3.98 

3.95 

3.94 

.95 

7.71 

6.94 

6.59 

6.39 

6.26 

6.16 

6.09 

6.04 

6.00 

.975 

12.2 

10.6 

9.98 

9.60 

9.36 

9.20 

9.07 

8.98 

8.90 

.99 

21.2 

18.0 

16.7 

16.0 

15.5 

15.2 

15.0 

14.8 

14.7 

.995 

31.3 

26.3 

24.3 

23.2 

22.5 

22.0 

21.6 

21.4 

21.1 

.999 

74.1 

61.2 

56.2 

53.4 

51.7 

50.5 

49.7 

49.0 

48.5 

5 

.50 

0.528 

0.799 

0.907 

0.965 

1.00 

1.02 

1.04 

1.05 

1.06 

.90 

4.06 

3.78 

3.62 

3.52 

3.45 

3.40 

3.37 

3.34 

3.32 

.95 

6.61 

5.79 

5.41 

5.19 

5.05 

4.95 

4.88 

4.82 

4.77 

.975 

10.0 

8.43 

7.76 

7.39 

7.15 

6.98 

6.85 

6.76 

6.68 

.99 

16.3 

13.3 

12.1 

11.4 

11.0 

10.7 

10.5 

10.3 

10.2 

.995 

22.8 

18.3 

16.5 

15.6 

14.9 

14.5 

14.2 

14.0 

13.8 

.999 

47.2 

37.1 

33.2 

31.1 

29.8 

28.8 

28.2 

27.6 

27.2 

6 

.50 

0.515 

0.780 

0.886 

0.942 

0.977 

1.00 

1.02 

1.03 

1.04 

.90 

3.78 

3.46 

3.29 

3.18 

3.11 

3.05 

3.01 

2.98 

2.96 

.95 

5.99 

5.14 

4.76 

4.53 

4.39 

4.28 

4.21 

4.15 

4.10 

.975 

8.81 

7.26 

6.60 

6.23 

5.99 

5.82 

5.70 

5.60 

5.52 

.99 

13.7 

10.9 

9.78 

9.15 

8.75 

8.47 

8.26 

8.10 

7.98 

.995 

18.6 

14.5 

12.9 

12.0 

11.5 

11.1 

10.8 

10.6 

10.4 

.999 

35.5 

27.0 

23.7 

21.9 

20.8 

20.0 

19.5 

19.0 

18.7 

7 

.50 

0.506 

0.767 

0.871 

0.926 

0.960 

0.983 

1.00 

1.01 

1.02 

.90 

3.59 

3.26 

3.07 

2.96 

2.88 

2.83 

2.78 

2.75 

2.72 

.95 

5.59 

4.74 

4.35 

4.12 

3.97 

3.87 

3.79 

3.73 

3.68 

.975 

8.07 

6.54 

5.89 

5.52 

5.29 

5.12 

4.99 

4.90 

4.82 

.99 

12.2 

9.55 

8.45 

7.85 

7.46 

7.19 

6.99 

6.84 

6.72 

.995 

16.2 

12.4 

10.9 

10.1 

9.52 

9.16 

8.89 

8.68 

8.51 

.999 

29.2 

21.7 

18.8 

17.2 

16.2 

15.5 

15.0 

14.6 

14.3 

Appendix  tables  /  523 


TABLE  A-4  (continued)  Percentiles  of  the  F  distribution 


524  /  Appendix  tables 


TABLE  A-4  (continued)  Percentiles  of  the  F  distribution 


Numerator  df 

Den 

df 

A 

1 

2 

3 

4 

5 

6 

7 

8 

9 

8 

.50 

0.499 

0.757 

0.860 

0.915 

0.948 

0.971 

0.988 

1.00 

1.01 

.90 

3.46 

3.11 

2.92 

2.81 

2.73 

2.67 

2.62 

2.59 

2.56 

.95 

5.32 

4.46 

4.07 

3.84 

3.69 

3.58 

3.50 

3.44 

3.39 

.975 

7.57 

6.06 

5.42 

5.05 

4.82 

4.65 

4.53 

4.43 

4.36 

.99 

11.3 

8.65 

7.59 

7.01 

6.63 

6.37 

6.18 

6.03 

5.91 

.995 

14.7 

11.0 

9.60 

8.81 

8.30 

7.95 

7.69 

7.50 

7.34 

.999 

25.4 

18.5 

15.8 

14.4 

13.5 

12.9 

12.4 

12.0 

11.8 

9 

.50 

0.494 

0.749 

0.852 

0.906 

0.939 

0.962 

0.978 

0.990 

1.00 

.90 

3.36 

3.01 

2.81 

2.69 

2.61 

2.55 

2.51 

2.47 

2.44 

.95 

5.12 

4.26 

3.86 

3.63 

3.48 

3.37 

3.29 

3.23 

3.18 

.975 

7.21 

5.71 

5.08 

4.72 

4.48 

4.32 

4.20 

4.10 

4.03 

.99 

10.6 

8.02 

6.99 

6.42 

6.06 

5.80 

5.61 

5.47 

5.35 

.995 

13.6 

10.1 

8.72 

7.96 

7.47 

7.13 

6.88 

6.69 

6.54 

.999 

22.9 

16.4 

13.9 

12.6 

11.7 

11.1 

10.7 

10.4 

10.1 

10 

.50 

0.490 

0.743 

0.845 

0.899 

0.932 

0.954 

0.971 

0.983 

0.992 

.90 

3.29 

2.92 

2.73 

2.61 

2.52 

2.46 

2.41 

2.38 

2.35 

.95 

4.96 

4.10 

3.71 

3.48 

3.33 

3.22 

3.14 

3.07 

3.02 

.975 

6.94 

5.46 

4.83 

4.47 

4.24 

4.07 

3.95 

3.85 

3.78 

.99 

10.0 

7.56 

6.55 

5.99 

5.64 

5.39 

5.20 

5.06 

4.94 

.995 

12.8 

9.43 

8.08 

7.34 

6.87 

6.54 

6.30 

6.12 

5.97 

.999 

21.0 

14.9 

12.6 

11.3 

10.5 

9.93 

9.52 

9.20 

8.96 

12 

.50 

0.484 

0.735 

0.835 

0.888 

0.921 

0.943 

0.959 

0.972 

0.981 

.90 

3.18 

2.81 

2.61 

2.48 

2.39 

2.33 

2.28 

2.24 

2.21 

.95 

4.75 

3.89 

3.49 

3.26 

3.11 

3.00 

2.91 

2.85 

2.80 

.975 

6.55 

5.10 

4.47 

4.12 

3.89 

3.73 

3.61 

3.51 

3.44 

.99 

9.33 

6.93 

5.95 

5.41 

5.06 

4.82 

4.64 

4.50 

4.39 

.995 

11.8 

8.51 

7.23 

6.52 

6.07 

5.76 

5.52 

5.35 

5.20 

.999 

18.6 

13.0 

10.8 

9.63 

8.89 

8.38 

8.00 

7.71 

7.48 

15 

.50 

0.478 

0.726 

0.826 

0.878 

0.911 

0.933 

0.949 

0.960 

0.970 

.90 

3.07 

2.70 

2.49 

2.36 

2.27 

2.21 

2.16 

2.12 

2.09 

.95 

4.54 

3.68 

3.29 

3.06 

2.90 

2.79 

2.71 

2.64 

2.59 

.975 

6.20 

4.77 

4.15 

3.80 

3.58 

3.41 

3.29 

3.20 

3.12 

.99 

8.68 

6.36 

5.42 

4.89 

4.56 

4.32 

4.14 

4.00 

3.89 

.995 

10.8 

7.70 

6.48 

5.80 

5.37 

5.07 

4.85 

4.67 

4.54 

.999 

16.6 

11.3 

9.34 

8.25 

7.57 

7.09 

6.74 

6.47 

6.26 

20 

.50 

0.472 

0.718 

0.816 

0.868 

0.900 

0.922 

0.938 

0.950 

0.959 

.90 

2.97 

2.59 

2.38 

2.25 

2.16 

2.09 

2.04 

2.00 

1.96 

.95 

4.35 

3.49 

3.10 

2.87 

2.71 

2.60 

2.51 

2.45 

2.39 

.975 

5.87 

4.46 

3.86 

3.51 

3.29 

3.13 

3.01 

2.91 

2.84 

.99 

8.10 

5.85 

4.94 

4.43 

4.10 

3.87 

3.70 

3.56 

3.46 

.995 

9.94 

6.99 

5.82 

5.17 

4.76 

4.47 

4.26 

4.09 

3.96 

.999 

14.8 

9.95 

8.10 

7.10 

6.46 

6.02 

5.69 

5.44 

5.24 

24 

.50 

0.469 

0.714 

0.812 

0.863 

0.895 

0.917 

0.932 

0.944 

0.953 

.90 

2.93 

2.54 

2.33 

2.19 

2.10 

2.04 

1.98 

1.94 

1.91 

.95 

4.26 

3.40 

3.01 

2.78 

2.62 

2.51 

2.42 

2.36 

2.30 

.975 

5.72 

4.32 

3.72 

3.38 

3.15 

2.99 

2.87 

2.78 

2.70 

.99 

7.82 

5.61 

4.72 

4.22 

3.90 

3.67 

3.50 

3.36 

3.26 

.995 

9.55 

6.66 

5.52 

4.89 

4.49 

4.20 

3.99 

3.83 

3.69 

.999 

14.0 

9.34 

7.55 

6.59 

5.98 

5.55 

5.23 

4.99 

4.80 

Appendix  tables  /  525 


TABLE  A-4  (continued)  Percentiles  of  the  F  distribution 


Den 

df 

A 

Numerator  df 

10 

12 

15 

20 

24 

30 

60 

120 

00 

8 

.50 

1.02 

1.03 

1.04 

1.05 

1.06 

1.07 

1.08 

1.08 

1.09 

.90 

2.54 

2.50 

2.46 

2.42 

2.40 

2.38 

2.34 

2.32 

2.29 

.95 

3.35 

3.28 

3.22 

3.15 

3.12 

3.08 

3.01 

2.97 

2.93 

.975 

4.30 

4.20 

4.10 

4.00 

3.95 

3.89 

3.78 

3.73 

3.67 

.99 

5.81 

5.67 

5.52 

5.36 

5.28 

5.20 

5.03 

4.95 

4.86 

.995 

7.21 

7.01 

6.81 

6.61 

6.50 

6.40 

6.18 

6.06 

5.95 

.999 

11.5 

11.2 

10.8 

10.5 

10.3 

10.1 

9.73 

9.53 

9.33 

9 

.50 

1.01 

1.02 

1.03 

1.04 

1.05 

1.05 

1.07 

1.07 

1.08 

.90 

2.42 

2.38 

2.34 

2.30 

2.28 

2.25 

2.21 

2.18 

2.16 

.95 

3.14 

3.07 

3.01 

2.94 

2.90 

2.86 

2.79 

2.75 

2.71 

.975 

3.96 

3.87 

3.77 

3.67 

3.61 

3.56 

3.45 

3.39 

3.33 

.99 

5.26 

5.11 

4.96 

4.81 

4.73 

4.65 

4.48 

4.40 

4.31 

.995 

6.42 

6.23 

6.03 

5.83 

5.73 

5.62 

5.41 

5.30 

5.19 

.999 

9.89 

9.57 

9.24 

8.90 

8.72 

8.55 

8.19 

8.00 

7.81 

10 

.50 

1.00 

1.01 

1.02 

1.03 

1.04 

1.05 

1.06 

1.06 

1.07 

.90 

2.32 

2.28 

2.24 

2.20 

2.18 

2.16 

2.11 

2.08 

2.06 

.95 

'  2.98 

2.91 

2.84 

2.77 

2.74 

2.70 

2.62 

2.58 

2.54 

.975 

3.72 

3.62 

3.52 

3.42 

3.37 

3.31 

3.20 

3.14 

3.08 

.99 

4.85 

4.71 

4.56 

4.41 

4.33 

4.25 

4.08 

4.00 

3.91 

.995 

5.85 

5.66 

5.47 

5.27 

5.17 

5.07 

4.86 

4.75 

4.64 

.999 

8.75 

8.45 

8.13 

7.80 

7.64 

7.47 

7.12 

6.94 

6.76 

12 

.50 

0.989 

1.00 

1.01 

1.02 

1.03 

1.03 

1.05 

1.05 

1.06 

.90 

2.19 

2.15 

2.10 

2.06 

2.04 

2.01 

1.96 

1.93 

1.90 

.95 

2.75 

2.69 

2.62 

2.54 

2.51 

2.47 

2.38 

2.34 

2.30 

.975 

3.37 

3.28 

3.18 

3.07 

3.02 

2.96 

2.85 

2.79 

2.72 

.99 

4.30 

4.16 

4.01 

3.86 

3.78 

3.70 

3.54 

3.45 

3.36 

.995 

5.09 

4.91 

4.72 

4.53 

4.43 

4.33 

4.12 

4.01 

3.90 

.999 

7.29 

7.00 

6.71 

6.40 

6.25 

6.09 

5.76 

5.59 

5.42 

15 

.50 

0.977 

0.989 

1.00 

1.01 

1.02 

1.02 

1.03 

1.04 

1.05 

.90 

2.06 

2.02 

1.97 

1.92 

1.90 

1.87 

1.82 

1.79 

1.76 

.95 

2.54 

2.48 

2.40 

2.33 

2.29 

2.25 

2.16 

2.11 

2.07 

.975 

3.06 

2.96 

2.86 

2.76 

2.70 

2.64 

2.52 

2.46 

2.40 

.99 

3.80 

3.67 

3.52 

3.37 

3.29 

3.21 

3.05 

2.96 

2.87 

.995 

4.42 

4.25 

4.07 

3.88 

3.79 

3.69 

3.48 

3.37 

3.26 

.999 

6.08 

5.81 

5.54 

5.25 

5.10 

4.95 

4.64 

4.48 

4.31 

20 

.50 

0.966 

0.977 

0.989 

1.00 

1.01 

1.01 

1.02 

1.03 

1.03 

.90 

1.94 

1.89 

1.84 

1.79 

1.77 

1.74 

1.68 

1.64 

1.61 

.95 

2.35 

2.28 

2.20 

2.12 

2.08 

2.04 

1.95 

1.90 

1.84 

.975 

2.77 

2.68 

2.57 

2.46 

2.41 

2.35 

2.22 

2.16 

2.09 

.99 

3.37 

3.23 

3.09 

2.94 

2.86 

2.78 

2.61 

2.52 

2.42 

.995 

3.85 

3.68 

3.50 

3.32 

3.22 

3.12 

2.92 

2.81 

2.69 

.999 

5.08 

4.82 

4.56 

4.29 

4.15 

4.00 

3.70 

3.54 

3.38 

24 

.50 

0.961 

0.972 

0.983 

0.994 

1.00 

1.01 

1.02 

1.02 

1.03 

.90 

1.88 

1.83 

1.78 

1.73 

1.70 

1.67 

1.61 

1.57 

1.53 

.95 

2.25 

2.18 

2.11 

2.03 

1.98 

1.94 

1.84 

1.79 

1.73 

.975 

2.64 

2.54 

2.44 

2.33 

2.27 

2.21 

2.08 

2.01 

1.94 

.99 

3.17 

3.03 

2.89 

2.74 

2.66 

2.58 

2.40 

2.31 

2.21 

.995 

3.59 

3.42 

3.25 

3.06 

2.97 

2.87 

2.66 

2.55 

2.43 

.999 

4.64 

4.39 

4.14 

3.87 

3.74 

3.59 

3.29 

3.14 

2.97 

526  /  Appendix  tables 


TABLE  A-4  (continued)  Percentiles  of  the  F  distribution 


Den 

df 

L 

A 

Numerator  df 

1 

2 

3 

4 

5 

6 

7 

8 

9 

30 

.50 

0.466 

0.709 

0.807 

0.858 

0.890 

0.912 

0.927 

0.939 

0.948 

.90 

2.88 

2.49 

2.28 

2.14 

2.05 

1.98 

1.93 

1.88 

1.85 

.95 

4.17 

3.32 

2.92 

2.69 

2.53 

2.42 

2.33 

2.27 

2.21 

.975 

5.57 

4.18 

3.59 

3.25 

3.03 

2.87 

2.75 

2.65 

2.57 

.99 

7.56 

5.39 

4.51 

4.02 

3.70 

3.47 

3.30 

3.17 

3.07 

.995 

9.18 

6.35 

5.24 

4.62 

4.23 

3.95 

3.74 

3.58 

3.45 

.999 

13.3 

8.77 

7.05 

6.12 

5.53 

5.12 

4.82 

4.58 

4.39 

60 

.50 

0.461 

0.701 

0.798 

0.849 

0.880 

0.901 

0.917 

0.928 

0.937 

.90 

2.79 

2.39 

2.18 

2.04 

1.95 

1.87 

1.82 

1.77 

1.74 

.95 

4.00 

3.15 

2.76 

2.53 

2.37 

2.25 

2.17 

2.10 

2.04 

.975 

5.29 

3.93 

3.34 

3.01 

2.79 

2.63 

2.51 

2.41 

2.33 

.99 

7.08 

4.98 

4.13 

3.65 

3.34 

3.12 

2.95 

2.82 

2.72 

.995 

8.49 

5.80 

4.73 

4.14 

3.76 

3.49 

3.29 

3.13 

3.01 

.999 

12.0 

7.77 

6.17 

5.31 

4.76 

4.37 

4.09 

3.86 

3.69 

120 

.50 

0.458 

0.697 

0.793 

0.844 

0.875 

0.896 

0.912 

0.923 

0.932 

.90 

2.75 

2.35 

2.13 

1.99 

1.90 

1.82 

1.77 

1.72 

1.68 

.95 

3.92 

3.07 

2.68 

2.45 

2.29 

2.18 

2.09 

2.02 

1.96 

.975 

5.15 

3.80 

3.23 

2.89 

2.67 

2.52 

2.39 

2.30 

2.22 

.99 

6.85 

4.79 

3.95 

3.48 

3.17 

2.96 

2.79 

2.66 

2.56 

.995 

8.18 

5.54 

4.50 

3.92 

3.55 

3.28 

3.09 

2.93 

2.81 

.999 

11.4 

7.32 

5.78 

4.95 

4.42 

4.04 

3.77 

3.55 

3.38 

00 

.50 

0.455 

0.693 

0.789 

0.839 

0.870 

0.891 

0.907 

0.918 

0.927 

.90 

2.71 

2.30 

2.08 

1.94 

1.85 

1.77 

1.72 

1.67 

1.63 

.95 

3.84 

3.00 

2.60 

2.37 

2.21 

2.10 

2.01 

1.94 

1.88 

.975 

5.02 

3.69 

3.12 

2.79 

2.57 

2.41 

2.29 

2.19 

2.11 

.99 

6.63 

4.61 

3.78 

3.32 

3.02 

2.80 

2.64 

2.51 

2.41 

.995 

7.88 

5.30 

4.28 

3.72 

3.35 

3.09 

2.90 

2.74 

2.62 

.999 

10.8 

6.91 

5.42 

4.62 

4.10 

3.74 

3.47 

3.27 

3.10 

Appendix  tables  /  527 


TABLE  A-4  (concluded)  Percentiles  of  the  F  distribution 


Den. 

df  A 

Numerator  df 

10 

12 

15 

20 

24 

30 

60 

120 

CO 

30  .50 

0.955 

0.966 

0.978 

0.989 

0.994 

3.00 

1.01 

1.02 

1.02 

.90 

1.82 

1.77 

1.72 

1.67 

1.64 

1.61 

1.54 

1.50 

1.46 

.95 

2.16 

2.09 

2.01 

1.93 

1.89 

1.84 

1.74 

1.68 

1.62 

.975 

2.51 

2.41 

2.31 

2.20 

2.14 

2.07 

1.94 

1.87 

1.79 

.99 

2.98 

2.84 

2.70 

2.55 

2.47 

2.39 

2.21 

2.11 

2.01 

.995 

3.34 

3.18 

3.01 

2.82 

2.73 

2.63 

2.42 

2.30 

2.18 

.999 

4.24 

4.00 

3.75 

3.49 

3.36 

3.22 

2.92 

2.76 

2.59 

60  .50 

0.945 

0.956 

0.967 

0.978 

0.983 

0.989 

1.00 

1.01 

1.01 

.90 

1.71 

1.66 

1.60 

1.54 

1.51 

1.48 

1.40 

1.35 

1.29 

.95 

1.99 

1.92 

1.84 

1.75 

1.70 

1.65 

1.53 

1.47 

1.39 

.975 

2.27 

2.17 

2.06 

1.94 

1.88 

1.82 

1.67 

1.58 

1.48 

.99 

2.63 

2.50 

2.35 

2.20 

2.12 

2.03 

1.84 

1.73 

1.60 

.995 

2.90 

2.74 

2.57 

2.39 

2.29 

2.19 

1.96 

1.83 

1.69 

.999 

3.54 

3.32 

3.08 

2.83 

2.69 

2.55 

2.25 

2.08 

1.89 

120  .50 

0.939 

0.950 

0.961 

0.972 

0.978 

0.983 

0.994 

1.00 

1.01 

.90 

1.65 

1.60 

1.55 

1.48 

1.45 

1.41 

1.32 

1.26 

1.19 

.95 

1.91 

1.83 

1.75 

1.66 

1.61 

1.55 

1.43 

1.35 

1.25 

.975 

2.16 

2.05 

1.95 

1.82 

1.76 

1.69 

1.53 

1.43 

1.31 

.99 

2.47 

2.34 

2.19 

2.03 

1.95 

1.86 

1.66 

1.53 

1.38 

.995 

2.71 

2.54 

2.37 

2.19 

2.09 

1.98 

1.75 

1.61 

1.43 

.999 

3.24 

3.02 

2.78 

2.53 

2.40 

2.26 

1.95 

1.77 

1.54 

co  .50 

0.934 

0.945 

0.956 

0.967 

0.972 

0.978 

0.989 

0.994 

1.00 

.90 

1.60 

1.55 

1.49 

1.42 

1.38 

1.34 

1.24 

1.17 

1.00 

.95 

1.83 

1.75 

1.67 

1.57 

1.52 

1.46 

1.32 

1.22 

1.00 

.975 

2.05 

1.94 

1.83 

1.71 

1.64 

1.57 

1.39 

1.27 

1.00 

.99 

2.32 

2.18 

2.04 

1.88 

1.79 

1.70 

1.47 

1.32 

1.00 

.995 

2.52 

2.36 

2.19 

2.00 

1.90 

1.79 

1.53 

1.36 

1.00 

.999 

2.96 

2.74 

2.51 

2.27 

2.13 

1.99 

1.66 

1.45 

1.00 

Source:  Reprinted  from  Table  5  of  Pearson  and  Hartley,  Biometrika  Tables  for  Statisticians,  Volume  2,  1972, 
published  by  the  Cambridge  University  Press,  on  behalf  of  The  Biometrika  Society,  by  permission  of  the  authors 
and  publishers. 


528  /  Appendix  tables 


Appendix  tables  /  529 


TABLE  A-5  (concluded)  Power  function  for  two-sided  t  test 

a  =  .01 


Source:  Reprinted,  with  permission,  fromD.  B.  Owen,  Handbook  of  Statistical  Tables  (Reading,  Mass.:  Addi- 
son-Wesley  Publishing,  1962),  pp.  32,  34.  Courtesy  of  U.S.  Atomic  Energy  Commission. 


530  /  Appendix  tables 


TABLE  A-6  Durbin-Watson  test  bounds 

Level  of  Significance  a  =  .05 


n 

P  ~  ■ 

l  =  1 

P  -  - 

l  =  2 

P  - 

1  =  3 

P  - 

1  =  4 

P  -  '■ 

1  =  5 

dL 

dv 

d  L 

dL 

dL 

dv 

dL 

dv 

dL 

dv 

15 

1.08 

1.36 

0.95 

1.54 

0.82 

1.75 

0.69 

1.97 

0.56 

2.21 

16 

1.10 

1.37 

0.98 

1.54 

0.86 

1.73 

0.74 

1.93 

0.62 

2.15 

17 

1.13 

1.38 

1.02 

1.54 

0.90 

1.71 

0.78 

1.90 

0.67 

2.10 

18 

1.16 

1.39 

1.05 

1.53 

0.93 

1.69 

0.82 

1.87 

0.71 

2.06 

19 

1.18 

1.40 

1.08 

1.53 

0.97 

1.68 

0.86 

1.85 

0.75 

2.02 

20 

1.20 

1.41 

1.10 

1.54 

1.00 

1.68 

0.90 

1.83 

0.79 

1.99 

21 

1.22 

1.42 

1.13 

1.54 

1.03 

1.67 

0.93 

1.81 

0.83 

1.96 

22 

1.24 

1.43 

1.15 

1.54 

1.05 

1.66 

0.96 

1.80 

0.86 

1.94 

23 

1.26 

1.44 

1.17 

1.54 

1.08 

1.66 

0.99 

1.79 

0.90 

1.92 

24 

1.27 

1.45 

1.19 

1.55 

1.10 

1.66 

1.01 

1.78 

0.93 

1.90 

25 

1.29 

1.45 

1.21 

1.55 

1.12 

1.66 

1.04 

1.77 

0.95 

1.89 

26 

1.30 

1.46 

1.22 

1.55 

1.14 

1.65 

1.06 

1.76 

0.98 

1.88 

27 

1.32 

1.47 

1.24 

1.56 

1.16 

1.65 

1.08 

1.76 

1.01 

1.86 

28 

1.33 

1.48 

1.26 

1.56 

1.18 

1.65 

1.10 

1.75 

1.03 

1.85 

29 

1.34 

1.48 

1.27 

1.56 

1.20 

1.65 

1.12 

1.74 

1.05 

1.84 

30 

1.35 

1.49 

1.28 

1.57 

1.21 

1.65 

1.14 

1.74 

1.07 

1.83 

31 

1.36 

1.50 

1.30 

1.57 

1.23 

1.65 

1.16 

1.74 

1.09 

1.83 

32 

1.37 

1.50 

1.31 

1.57 

1.24 

1.65 

1.18 

1.73 

1.11 

1.82 

33 

1.38 

1.51 

1.32 

1.58 

1.26 

1.65 

1.19 

1.73 

1.13 

1.81 

34 

1.39 

1.51 

1.33 

1.58 

1.27 

1.65 

1.21 

1.73 

1.15 

1.81 

35 

1.40 

1.52 

1.34 

1.58 

1.28 

1.65 

1.22 

1.73 

1.16 

1.80 

36 

1.41 

1.52 

1.35 

1.59 

1.29 

1.65 

1.24 

1.73 

1.18 

1.80 

37 

1.42 

1.53 

1.36 

1.59 

1.31 

1.66 

1.25 

1.72 

1.19 

1.80 

38 

1.43 

1.54 

1.37 

1.59 

1.32 

1.66 

1.26 

1.72 

1.21 

1.79 

39 

1.43 

1.54 

1.38 

1.60 

1.33 

1.66 

1.27 

1.72 

1.22 

1.79 

40 

1.44 

1.54 

1.39 

1.60 

1.34 

1.66 

1.29 

1.72 

1.23 

1.79 

45 

1.48 

1.57 

1.43 

1.62 

1.38 

1.67 

1.34 

1.72 

1.29 

1.78 

50 

1.50 

1.59 

1.46 

1.63 

1.42 

1.67 

1.38 

1.72 

1.34 

1.77 

55 

1.53 

1.60 

1.49 

1.64 

1.45 

1.68 

1.41 

1.72 

1.38 

1.77 

60 

1.55 

1.62 

1.51 

1.65 

1.48 

1.69 

1.44 

1.73 

1.41 

1.77 

65 

1.57 

1.63 

1.54 

1.66 

1.50 

1.70 

1.47 

1.73 

1.44 

1.77 

70 

1.58 

1.64 

1.55 

1.67 

1.52 

1.70 

1.49 

1.74 

1.46 

1.77 

75 

1.60 

1.65 

1.57 

1.68 

1.54 

1.71 

1.51 

1.74 

1.49 

1.77 

80 

1.61 

1.66 

1.59 

1.69 

1.56 

1.72 

1.53 

1.74 

1.51 

1.77 

85 

1.62 

1.67 

1.60 

1.70 

1.57 

1.72 

1.55 

1.75 

1.52 

1.77 

90 

1.63 

1.68 

1.61 

1.70 

1.59 

1.73 

1.57 

1.75 

1.54 

1.78 

95 

1.64 

1.69 

1.62 

1.71 

1.60 

1.73 

1.58 

1.75 

1.56 

1.78 

100 

1.65 

1.69 

1.63 

1.72 

1.61 

1.74 

1.59 

1.76 

1.57 

1.78 

Appendix  tables  /  531 


TABLE  A-6  (concluded)  Durbin-Watson  test  bounds 

Level  of  Significance  a  —  .01 


n 

1 

P  -  1 

=  1 

P  - 

1  =  2 

P  -  1 

=  3 

P  -  1 

=  4 

P~ 

1  =  5 

dL 

du 

dL 

dv 

du 

du 

du 

dv 

dL 

dv 

15 

0.81 

1.07 

0.70 

1.25 

0.59 

1.46 

0.49 

1.70 

0.39 

1.96 

16 

0.84 

1.09 

0.74 

1.25 

0.63 

1.44 

0.53 

1.66 

0.44 

1.90 

17 

0.87 

1.10 

0.77 

1.25 

0.67 

1.43 

0.57 

1.63 

0.48 

1.85 

18 

0.90 

1.12 

0.80 

1.26 

0.71 

1.42 

0.61 

1.60 

0.52 

1.80 

19 

0.93 

1.13 

0.83 

1.26 

0.74 

1.41 

0.65 

1.58 

0.56 

1.77 

20 

0.95 

1.15 

0.86 

1.27 

0.77 

1.41 

0.68 

1.57 

0.60 

1.74 

21 

0.97 

1.16 

0.89 

1.27 

0.80 

1.41 

0.72 

1.55 

0.63 

1.71 

22 

1.00 

1.17 

0.91 

1.28 

0.83 

1.40 

0.75 

1.54 

0.66 

1.69 

23 

1.02 

1.19 

0.94 

1.29 

0.86 

1.40 

0.77 

1.53 

0.70 

1.67 

24 

1.04 

1.20 

0.96 

1.30 

0.88 

1.41 

0.80 

1.53 

0.72 

1.66 

25 

1.05 

1.21 

0.98 

1.30 

0.90 

1.41 

0.83 

1.52 

0.75 

1.65 

26 

1.07 

1.22 

1.00 

1.31 

0.93 

1.41 

0.85 

1.52 

0.78 

1.64 

27 

1.09 

1.23 

1.02 

1.32 

0.95 

1.41 

0.88 

1.51 

0.81 

1.63 

28 

1.10 

1.24 

1.04 

1.32 

0.97 

1.41 

0.90 

1.51 

0.83 

1.62 

29 

1.12 

1.25 

1.05 

1.33 

0.99 

1.42 

0.92 

1.51 

0.85 

1.61 

30 

1.13 

1.26 

1.07 

1.34 

1.01 

1.42 

0.94 

1.51 

0.88 

1.61 

31 

1.15 

1.27 

1.08 

1.34 

1.02 

1.42 

0.96 

1.51 

0.90 

1.60 

32 

1.16 

1.28 

1.10 

1.35 

1.04 

1.43 

0.98 

1.51 

0.92 

1.60 

33 

1.17 

1.29 

1.11 

1.36 

1.05 

1.43 

1.00 

1.51 

0.94 

1.59 

34 

1.18 

1.30 

1.13 

1.36 

1.07 

1.43 

1.01 

1.51 

0.95 

1.59 

35 

1.19 

1.31 

1.14 

1.37 

1.08 

1.44 

1.03 

1.51 

0.97 

1.59 

36 

1.21 

1.32 

1.15 

1.38 

1.10 

1.44 

1.04 

1.51 

0.99 

1.59 

37 

1.22 

1.32 

1.16 

1.38 

1.11 

1.45 

1.06 

1.51 

1.00 

1.59 

38 

1.23 

1.33 

1.18 

1.39 

1.12 

1.45 

1.07 

1.52 

1.02 

1.58 

39 

1.24 

1.34 

1.19 

1.39 

1.14 

1.45 

1.09 

1.52 

1.03 

1.58 

40 

1.25 

1.34 

1.20 

1.40 

1.15 

1.46 

1.10 

1.52 

1.05 

1.58 

45 

1.29 

1.38 

1.24 

1.42 

1.20 

1.48 

1.16 

1.53 

1.11 

1.58 

50 

1.32 

1.40 

1.28 

1.45 

1.24 

1.49 

1.20 

1.54 

1.16 

1.59 

55 

1.36 

1.43 

1.32 

1.47 

1.28 

1.51 

1.25 

1.55 

1.21 

1.59 

60 

1.38 

1.45 

1.35 

1.48 

1.32 

1.52 

1.28 

1.56 

1.25 

1.60 

65 

1.41 

1.47 

1.38 

1.50 

1.35 

1.53  • 

1.31 

1.57 

1.28 

1.61 

70 

1.43 

1.49 

1.40 

1.52 

1.37 

1.55 

1.34 

1.58 

1.31 

1.61 

75 

1.45 

1.50 

1.42 

1.53 

1.39 

1.56 

1.37 

1.59 

1.34 

1.62 

80 

1.47 

1.52 

1.44 

1.54 

1.42 

1.57 

1.39 

1.60 

1.36 

1.62 

85 

1.48 

1.53 

1.46 

1.55 

1.43 

1.58 

1.41 

1.60 

1.39 

1.63 

90 

1.50 

1.54 

1.47 

1.56 

1.45 

1.59 

1.43 

1.61 

1.41 

1.64 

95 

1.51 

1.55 

1.49 

1.57 

1.47 

1.60 

1.45 

1.62 

1.42 

1.64 

100 

1.52 

1.56 

1.50 

1.58 

1.48 

1.60 

1.46 

1.63 

1.44 

1.65 

Source:  Reprinted,  with  permission,  from  J.  Durbin  and  G.  S.  Watson,  “Testing  for  Serial  Correlation  in 
Least  Squares  Regression.  II,”  Biometrika  38  (1951),  pp.  159-78. 


532  /  Appendix  tables 


TABLE  A-7  Table  of  z'  transformation  of  correlation  coefficient 


r 

zf 

r 

z' 

r 

zf 

r 

z> 

P 

i 

P 

i 

P 

c 

P 

i 

.00 

.0000 

.25 

.2554 

.50 

.5493 

.75 

.973 

.01 

.0100 

.26 

.2661 

.51 

.5627 

.76 

.996 

.02 

.0200 

.27 

.2769 

.52 

.5763 

.77 

1.020 

.03 

.0300 

.28 

.2877 

.53 

.5901 

.78 

1.045 

.04 

.0400 

.29 

.2986 

.54 

.6042 

.79 

1.071 

.05 

.0500 

.30 

.3095 

.55 

.6184 

.80 

1.099 

.06 

.0601 

.31 

.3205 

.56 

.6328 

.81 

1.127 

.07 

.0701 

.32 

.3316 

.57 

.6475 

.82 

1.157 

.08 

.0802 

.33 

.3428 

.58 

.6625 

.83 

1.188 

.09 

.0902 

.34 

.3541 

.59 

.6777 

.84 

1.221 

.10 

.1003 

.35 

.3654 

.60 

.6931 

.85 

1.256 

.11 

.1104 

.36 

.3769 

.61 

.7089 

.86 

1.293 

.12 

.1206 

.37 

.3884 

.62 

.7250 

.87 

1.333 

.13 

.1307 

.38 

.4001 

.63 

.7414 

.88 

1.376 

.14 

.1409 

.39 

.4118 

.64 

.7582 

.89 

1.422 

.15 

.1511 

.40 

.4236 

.65 

.7753 

.90 

1.472 

.16 

.1614 

.41 

.4356 

.66 

.7928 

.91 

1.528 

.17 

.1717 

.42 

.4477 

.67 

.8107 

.92 

1.589 

.18 

.1820 

.43 

.4599 

.68 

.8291 

.93 

1.658 

.19 

.1923 

.44 

.4722 

.69 

.8480 

.94 

1.738 

.20 

.2027 

.45 

.4847 

.70 

.8673 

.95 

1.832 

.21 

.2132 

.46 

.4973 

.71 

.8872 

.96 

1.946 

.22 

.2237 

.47 

.5101 

.72 

.9076 

.97 

2.092 

.23 

.2342 

.48 

.5230 

.73 

.9287 

.98 

2.298 

.24 

.2448 

.49 

.5361 

.74 

.9505 

.99 

2.647 

Source:  Abridged  from  Table  14  of  Pearson  and  Hartley,  Biometrika  Tables  for  Statisticians, 
Volume  1,  1966,  published  by  the  Cambridge  University  Press,  on  behalf  of  The  Biometrilca  So¬ 
ciety,  by  permission  of  the  authors  and  publishers. 


SENIC  data  set 


The  primary  objective  of  the  Study  on  the  Efficacy  of  Nosocomial  Infection 
Control  (SENIC  Project)  was  to  determine  whether  infection  surveillance  and 
control  programs  have  reduced  the  rates  of  nosocomial  (hospital- acquired)  infec¬ 
tion  in  United  States  hospitals.  This  data  set  consists  of  a  random  sample  of  113 
hospitals  selected  from  the  original  338  hospitals  surveyed. 

Each  line  of  the  data  set  has  an  identification  number  and  provides  informa¬ 
tion  on  11  other  variables  for  a  single  hospital.  The  data  presented  here  are  for 
the  1975-76  study  period.  The  12  variables  are: 

Variable 

Number  Variable  Name  Description 


1  Identification  number 

2  Length  of  stay 

3  Age 

4  Infection  risk 

5  Routine  culturing  ratio 


1-113 

Average  length  of  stay  of  all  patients  in  hospital 
(in  days) 

Average  age  of  patients  (in  years) 

Average  estimated  probability  of  acquiring  infection 
in  hospital  (in  percent) 

Ratio  of  number  of  cultures  performed  to  number  of 
patients  without  signs  or  symptoms  of  hospital- 
acquired  infection,  times  100 


533 


534  /  SENIC  data  set 


SEN  1C  data  set  (continued) 

Variable 

Number  Variable  Name  Description 


6 

7 

8 
9 

10 

11 


12 


Routine  chest  X-ray  ratio 

Number  of  beds 

Medical  school  affiliation 
Region 

Average  daily  census 
Number  of  nurses 

Available  facilities 
and  services 


Ratio  of  number  of  X  rays  performed  to  number  of 
patients  without  signs  or  symptoms  of  pneumonia, 
times  100 

Average  number  of  beds  in  hospital  during  study 
period 

1  =  Yes  2  =  No 

Geographic  region,  where:  1  =  NE,  2  =  NC,  3  =  S, 
4  =  W 

Average  number  of  patients  in  hospital  per  day 
during  study  period 

Average  number  of  full-time  equivalent  registered 
and  licensed  practical  nurses  during  study  period 
(number  full  time  plus  one  half  the  number 
part  time) 

Percent  of  35  potential  facilities  and  services  that 
are  provided  by  the  hospital 


Reference:  Special  Issue,  “The  SENIC  Project,”  American  Journal  of  Epidemiology  111  (1980),  pp.  465- 
653. 

Data  obtained  from:  Robert  W.  Haley,  M.D.,  Hospital  Infections  Program,  Center  for  Infectious  Diseases, 
Centers  for  Disease  Control,  Atlanta,  Georgia  30333. 


SENIC  data  set  (continued) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

1 

7. 

13 

55. 

7 

4. 

1 

9. 

0 

39. 

6 

279 

2 

4 

207 

241 

60. 

0 

2 

8. 

82 

58. 

0 

L. 

1. 

6 

3. 

8 

51. 

7 

80 

2 

2 

51 

52 

40. 

0 

3 

8. 

34 

56. 

9 

2. 

7 

8. 

1 

74. 

0 

107 

2 

3 

82 

54 

20. 

0 

4 

8. 

95 

53. 

7 

5. 

6 

18. 

9 

122. 

8 

147 

2 

4 

53 

148 

40. 

0 

5 

11. 

20 

56. 

5 

5. 

7 

34. 

5 

88. 

9 

180 

2 

1 

134 

151 

40. 

0 

6 

9. 

76 

50. 

9 

5. 

1 

21. 

9 

97. 

0 

150 

2 

2 

147 

106 

40. 

0 

7 

9. 

68 

57. 

8 

4. 

6 

16. 

7 

79. 

0 

186 

2 

3 

151 

129 

40. 

0 

8 

11. 

18 

45. 

7 

5. 

4 

60. 

5 

85. 

8 

640 

1 

2 

399 

360 

60. 

0 

9 

8. 

67 

48. 

2 

4. 

3 

24. 

4 

90. 

8 

182 

2 

3 

130 

118 

40. 

0 

10 

8. 

84 

56. 

3 

6. 

3 

29. 

6 

82. 

6 

85 

2 

1 

59 

66 

40. 

0 

11 

11. 

07 

53. 

2 

4. 

9 

28. 

5 

122. 

0 

768 

1 

1 

591 

656 

80. 

0 

12 

8. 

30 

57. 

2 

4. 

3 

6. 

8 

83. 

8 

167 

2 

3 

105 

59 

40. 

0 

13 

12. 

78 

56 . 

8 

7. 

7 

46. 

0 

116. 

9 

322 

1 

1 

252 

349 

57. 

1 

14 

7. 

58 

56. 

7 

3. 

7 

20. 

8 

88. 

0 

97 

2 

2 

59 

79 

37. 

1 

15 

9. 

00 

56. 

3 

4. 

2 

14. 

6 

76. 

4 

72 

2 

3 

61 

38 

17. 

1 

16 

11. 

08 

50. 

2 

5. 

5 

18. 

6 

63. 

6 

387 

2 

3 

326 

405 

57. 

1 

17 

8. 

28 

48. 

1 

4. 

5 

26. 

0 

101. 

8 

108 

2 

4 

84 

73 

37. 

1 

18 

11. 

62 

53. 

9 

6. 

4 

25. 

5 

99. 

2 

133 

2 

1 

113 

101 

37. 

1 

19 

9  . 

06 

52. 

8 

4. 

2 

6. 

9 

75. 

9 

134 

2 

2 

103 

125 

37. 

1 

20 

9  . 

35 

53. 

8 

4. 

1 

15. 

9 

80. 

9 

833 

2 

3 

547 

519 

77  . 

.  1 

21 

7. 

53 

42. 

0 

4. 

2 

23. 

1 

98. 

9 

95 

2 

4 

47 

49 

17. 

.  1 

22 

10. 

24 

49. 

0 

4. 

8 

36. 

3 

112. 

6 

195 

2 

2 

163 

170 

37. 

,1 

23 

9. 

78 

52. 

3 

5. 

.0 

17. 

6 

95. 

,9 

270 

1 

1 

240 

198 

57. 

,  1 

24 

9. 

,84 

62. 

2 

4. 

.8 

12. 

,0 

82. 

,3 

600 

2 

3 

468 

497 

57. 

.1 

25 

9. 

,20 

52. 

.2 

4. 

.0 

17. 

.5 

71. 

.  1 

298 

1 

4 

244 

236 

57. 

.1 

26 

8. 

.28 

49. 

5 

3. 

,9 

12. 

,0 

113. 

.  1 

546 

1 

2 

413 

436 

57. 

.  1 

27 

9. 

.31 

47. 

.2 

4. 

.5 

30. 

.2 

101. 

.3 

170 

2 

1 

124 

173 

37. 

.  1 

28 

8. 

.  19 

52. 

.1 

3. 

.2 

10. 

.8 

59. 

.2 

176 

2 

1 

156 

88 

37. 

.  1 

29 

11. 

.65 

54. 

.5 

4. 

.4 

18. 

.6 

96. 

.1 

248 

2 

1 

217 

189 

37, 

.1 

30 

9. 

.89 

50. 

.5 

4. 

.9 

17. 

.7 

103. 

.6 

167 

2 

2 

113 

106 

37, 

.  1 

31 

11, 

.03 

49. 

.9 

5, 

.0 

19. 

.7 

102, 

.  1 

318 

2 

1 

270 

335 

57, 

.1 

32 

9, 

.84 

53. 

.0 

5. 

.2 

17. 

.7 

72 

.6 

210 

2 

2 

200 

239 

54 

.3 

33 

11. 

.77 

54. 

.  1 

5. 

.3 

17 

.3 

56 

.0 

196 

2 

1 

164 

165 

34 

.3 

34 

13 

.59 

54. 

.0 

6, 

.1 

24. 

.2 

111 

.7 

312 

2 

1 

258 

169 

54, 

.3 

35 

9 

.74 

54, 

.4 

6 

.3 

11 

.4 

76 

.  1 

221 

2 

2 

170 

172 

54 

.3 

36 

10 

.33 

55 

.8 

5 

.0 

21 

.2 

104 

.3 

266 

2 

1 

181 

149 

54 

.3 

37 

9 

.97 

58. 

.2 

2, 

.8 

16 

.5 

76 

.5 

90 

2 

2 

69 

42 

34 

.3 

38 

7 

.84 

49 

.  1 

4 

.6 

7 

.  1 

87 

.9 

60 

2 

3 

50 

45 

34 

.3 

39 

10 

.47 

53 

.2 

4 

.  1 

5 

.  7 

69 

.  1 

196 

2 

2 

168 

153 

54 

.3 

40 

8 

.  16 

60 

.9 

1 

.3 

1 

.9 

58 

.0 

73 

2 

3 

49 

21 

14 

.3 

41 

8 

.48 

51 

.  1 

3 

.  7 

12 

.  1 

92 

.8 

166 

2 

3 

145 

118 

34 

.3 

42 

10 

.72 

53 

.8 

4 

.7 

23 

.2 

94 

.  1 

113 

2 

3 

90 

107 

34 

.3 

43 

11 

.20 

45 

.0 

3 

.0 

7 

.0 

78 

.9 

130 

2 

3 

95 

56 

34 

.3 

44 

10 

.  12 

51 

.7 

5 

.6 

14 

.9 

79 

.1 

362 

1 

3 

31S 

264 

54 

.3 

45 

8 

.37 

50 

.7 

5 

.5 

15 

.  1 

84 

.8 

115 

2 

2 

96 

88 

34 

.3 

46 

10 

.16 

54 

.2 

4 

.6 

8 

.4 

51 

.5 

831 

1 

4 

581 

629 

74 

.3 

47 

19 

.56 

59 

.9 

6 

.5 

17 

.2 

113 

.7 

306 

2 

1 

273 

172 

51 

.4 

48 

10 

.90 

57 

.2 

5 

.5 

10 

.6 

71 

.9 

593 

2 

2 

446 

211 

51 

.4 

49 

7 

.67 

51 

.7 

1 

.8 

2 

.5 

40 

.4 

106 

2 

3 

93 

35 

11 

.4 

50 

8 

.88 

51 

.5 

4 

.2 

10 

.  1 

86 

.9 

305 

2 

3 

238 

197 

51 

.4 

51 

11 

.48 

57 

.6 

5 

.6 

20 

.3 

82 

.0 

252 

2 

1 

207 

251 

51 

.4 

52 

9 

.23 

51 

.6 

4 

.3 

11 

.6 

42 

.6 

620 

2 

2 

413 

420 

71 

.4 

53 

11 

.41 

61 

.  1 

7 

.6 

16 

.6 

97 

.9 

535 

2 

3 

330 

273 

51 

.4 

54 

12 

.07 

43 

.7 

7 

.8 

52 

.4 

105 

.3 

157 

2 

2 

115 

76 

31 

.4 

55 

8 

.63 

54 

.0 

3 

.1 

8 

.4 

56 

.2 

76 

2 

1 

39 

44 

31 

.4 

56 

11 

.  15 

56 

.5 

3 

.9 

7 

.7 

73 

.9 

281 

2 

1 

217 

199 

51 

.4 

SENIC  data  set  (concluded) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

57 

7.14 

59.0 

3.7 

2.6 

75.8 

70 

2 

4 

37 

35 

31.4 

58 

7.65 

47.1 

4.3 

16.4 

65.7 

318 

2 

4 

265 

314 

51.4 

59 

10.73 

50.6 

3.9 

19.3 

101.0 

445 

1 

2 

374 

345 

51.4 

60 

11.46 

56.9 

4.5 

15.6 

97.7 

191 

2 

3 

153 

132 

31.4 

61 

10.42 

58.0 

3.4 

8.0 

59.0 

119 

2 

1 

67 

64 

31.4 

62 

11.18 

51.0 

5.7 

18.8 

55.9 

595 

1 

2 

546 

392 

68.6 

63 

7.93 

64.  1 

5.4 

7.5 

98.1 

68 

2 

4 

42 

49 

28.6 

64 

9.66 

52.1 

4.4 

9.9 

98.3 

83 

2 

2 

66 

95 

28.6 

65 

7.78 

45.5 

5.0 

20.9 

71.6 

489 

2 

3 

391 

329 

48.6 

66 

9.42 

50.6 

4.3 

24.8 

62.8 

508 

2 

1 

421 

528 

48.6 

67 

10.02 

49.5 

4.4 

8.3 

93.0 

265 

2 

2 

191 

202 

48.6 

68 

8.58 

55.0 

3.7 

7.4 

95,.  9 

304 

2 

3 

248 

218 

48.6 

69 

9.61 

52.4 

4.5 

6.9 

87.2 

487 

2 

3 

404 

220 

48.6 

70 

8.03 

54.2 

3.5 

24.3 

87.3 

97 

2 

1 

65 

55 

28.6 

71 

7.39 

51.0 

4.2 

14.6 

88.4 

72 

2 

2 

38 

67 

28.6 

72 

7.08 

52.0 

2.0 

12.3 

56.4 

87 

2 

3 

52 

57 

28.6 

73 

9.53 

51.5 

5.2 

15.0 

65.7 

298 

2 

3 

241 

193 

48.6 

74 

10.05 

52.0 

4.5 

36.7 

87.5 

184 

1 

1 

144 

151 

68.6 

75 

8.45 

38.8 

3.4 

12.9 

85.0 

235 

2 

2 

143 

124 

48.6 

76 

6.70 

48.6 

4.5 

13.0 

80.8 

76 

2 

4 

51 

79 

28.6 

77 

8.90 

49.7 

2.9 

12.7 

86.9 

52 

2 

1 

37 

35 

28.6 

78 

10.23 

53.2 

4.9 

9.9 

77.9 

752 

1 

2 

595 

446 

68.6 

79 

8.88 

55.8 

4.4 

14.1 

76.8 

237 

2 

2 

165 

182 

48.6 

80 

10.30 

59.6 

5 . 1 

27.8 

88.9 

175 

2 

2 

113 

73 

45.7 

81 

10.79 

44.2 

2.9 

2.6 

56.6 

461 

1 

2 

320 

196 

65.7 

82 

7.94 

49.5 

3.5 

6.2 

92.3 

195 

2 

2 

139 

116 

45.7 

83 

7.63 

52.1 

5.5 

11.6 

61.1 

197 

2 

4 

109 

110 

45.7 

84 

8.77 

54.5 

4.7 

5.2 

47.0 

143 

2 

4 

85 

87 

25.7 

85 

8.09 

56.9 

1.7 

7.6 

56.9 

92 

2 

3 

61 

61 

45.7 

86 

9.05 

51.2 

4.1 

20.5 

79.8 

195 

2 

3 

127 

112 

45.7 

87 

7.91 

52.8 

2.9 

11.9 

79.5 

477 

2 

3 

349 

188 

65.7 

88 

10.39 

54.6 

4.3 

14.0 

88.3 

353 

2 

2 

223 

200 

65.7 

89 

9.36 

54.1 

4.8 

18.3 

90.6 

165 

2 

1 

127 

158 

45.7 

90 

11.41 

50.4 

5.8 

23.8 

73.0 

424 

1 

3 

359 

335 

45.7 

91 

8.86 

51.3 

2.9 

9.5 

87.5 

100 

2 

3 

65 

53 

25.7 

92 

8.93 

56.0 

2.0 

6.2 

72.5 

95 

2 

3 

59 

56 

25.7 

93 

8.92 

53.9 

1.3 

2.2 

79.5 

56 

2 

2 

40 

14 

5.7 

94 

8.15 

54.9 

5.3 

12.3 

79.8 

99 

2 

4 

55 

71 

25.7 

95 

9.77 

50.2 

5.3 

15.7 

89.7 

154 

2 

2 

123 

148 

25.7 

96 

8.54 

56.1 

2.5 

27.0 

82.5 

98 

2 

1 

57 

75 

45.7 

97 

8.66 

52.8 

3.8 

6.8 

69.5 

246 

2 

3 

178 

177 

45.7 

98 

12.01 

52.8 

4.8 

10.8 

96.9 

298 

2 

1 

237 

115 

45.7 

99 

7.95 

51.8 

2.3 

4 . 6 

54.9 

163 

2 

3 

128 

93 

42.9 

100 

10.15 

51.9 

6.2 

16.4 

59.2 

568 

1 

3 

452 

371 

62.9 

101 

9.76 

53.2 

2.6 

6.9 

80.1 

64 

2 

4 

47 

55 

22.9 

102 

9.89 

45.2 

4.3 

11.8 

108.7 

190 

2 

1 

141 

112 

42.9 

103 

7.14 

57.6 

2.7 

13.1 

92.6 

92 

2 

4 

40 

50 

22.9 

104 

13.95 

65.9 

6.6 

15.6 

133.5 

356 

2 

1 

308 

182 

62.9 

105 

9.44 

52.5 

4.5 

10.9 

58.5 

297 

2 

3 

230 

263 

42.9 

106 

10.80 

63.9 

2.9 

1.6 

57.4 

130 

2 

3 

69 

62 

22.9 

107 

7.14 

51.7 

1.4 

4.1 

45.7 

115 

2 

3 

90 

19 

22.9 

108 

8.02 

55.0 

2.1 

3.8 

46.5 

91 

2 

2 

44 

32 

22.9 

109 

11.80 

53.8 

5.7 

9.1 

116.9 

571 

1 

2 

441 

469 

62.9 

110 

9.50 

49.3 

5.8 

42.0 

70.9 

98 

2 

3 

68 

46 

22.9 

111 

7.70 

56.9 

4.4 

12.2 

67.9 

129 

2 

4 

85 

136 

62.9 

112 

17.94 

56.2 

5.9 

26.4 

91.8 

835 

1 

1 

791 

407 

62.9 

113 

9.41 

59.5 

3.1 

20.6 

91.7 

29 

2 

3 

20 

22 

22.9 

SMSA  data  set 


This  data  set  provides  information  for  141  large  Standard  Metropolitan  Statis¬ 
tical  Areas  (SMSA’s)  in  the  United  States.  A  standard  metropolitan  statistical 
area  includes  a  city  (or  cities)  of  specified  population  size  which  constitutes  the 
central  city  and  the  county  (or  counties)  in  which  it  is  located,  as  well  as  contigu¬ 
ous  counties  when  the  economic  and  social  relationships  between  the  central  and 
contiguous  counties  meet  specified  criteria  of  metropolitan  character  and  integra¬ 
tion.  An  SMSA  may  have  up  to  three  central  cities  and  may  cross  state  lines. 

Each  line  of  the  data  set  has  an  identification  number  and  provides  informa¬ 
tion  on  11  other  variables  for  a  single  SMSA.  The  information  generally  pertains 
to  the  years  1976  and  1977,  the  most  recent  information  available  at  the  time. 
The  12  variables  are: 


Variable 

Number  Variable  Name  Description 


1  Identification  number 

2  Land  area 

3  Total  population 

4  Percent  of  population  in 

central  cities 


1-141 

In  square  miles 

Estimated  1977  population  (in  thousands) 

Percent  of  1976  SMSA  population  in  central  city 
or  cities 


5  Percent  of  population  65 
or  older 


Percent  of  1976  SMSA  population  65  years  old  or 
older 


537 


538  /  SMSA  data  set 


SMSA  data  set  (continued) 

Variable 

Number  Variable  Name  Description 


6 

7 

8 


9 

10 


11 


12 


Number  of  active  physicians 

Number  of  hospital  beds 

Percent  high  school 
graduates 

Civilian  labor  force 

Total  personal  income 


Total  serious  crimes 


Geographic  region 


Number  of  professionally  active  nonfederal 
physicians  as  of  December  31,  1977 
Total  number  of  beds,  cribs,  and  bassinettes 
during  1977 

Percent  of  adult  population  (persons  25  years  old 
or  older)  who  completed  12  or  more  years  of 
school,  according  to  the  1970  Census  of  the 
Population 

Total  number  of  persons  in  civilian  labor  force 
(persons  16  years  old  or  older  classified  as 
employed  or  unemployed)  in  1977  (in  thousands) 
Total  current  income  received  in  1976  by  residents 
of  the  SMSA  from  all  sources,  before  deduction 
of  income  and  other  personal  taxes  but  after 
deduction  of  personal  contributions  to  social 
security  and  other  social  insurance  programs 
(in  millions  of  dollars) 

Total  number  of  serious  crimes  in  1977,  including 
murder,  rape,  robbery,  aggravated  assault, 
burglary,  larceny-theft,  and  motor  vehicle  theft, 
as  reported  by  law  enforcement  agencies 
Geographic  region  classification  is  that  used  by 
the  U.S.  Bureau  of  the  Census,  where:  1  =  NE, 
2  =  NC,  3  =  S,  4  =  W 


Data  obtained  from:  U.S.  Bureau  of  the  Census,  State  and  Metropolitan  Area  Data  Book,  1979  (a  Statistical 
Abstract  Supplement). 


SMSA  data  set  (continued) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

1 

1384 

9387 

78.1 

12.3 

25627 

69678 

50.1 

4083.9 

72100 

709234 

1 

2 

4069 

7031 

44.0 

10.0 

15389 

39699 

62.0 

3353.6 

52737 

499813 

4 

3 

3719 

7017 

43.9 

9.4 

13326 

43292 

53.9 

3305.9 

54542 

393162 

2 

4 

3553 

4794 

37.4 

10.7 

9724 

33731 

50.6 

2066.3 

33216 

198102 

1 

5 

3916 

4370 

29.9 

8.8 

6402 

24167 

52.2 

1966.7 

32906 

294466 

2 

6 

2480 

3182 

31.5 

10.5 

8502 

16751 

66.1 

1514.5 

26573 

255162 

4 

7 

2815 

3033 

23.1 

6.7 

7340 

16941 

68.3 

1541.9 

25663 

177355 

3 

8 

1218 

2688 

0.0 

8.8 

5255 

22137 

62.9 

1213.3 

21524 

127567 

1 

9 

8360 

2673 

46 . 3 

8.2 

4047 

14347 

53.6 

1321.2 

18350 

193125 

3 

10 

6794 

2512 

60.1 

6.3 

4562 

14333 

51.7 

1272.7 

18221 

162976 

3 

11 

4935 

2380 

21.8 

11.0 

4071 

17752 

47.8 

1061.2 

16120 

137479 

2 

12 

3049 

2294 

19.5 

12.1 

4005 

21149 

53.4 

967.5 

15826 

69989 

1 

13 

2259 

2147 

38.6 

9.3 

5141 

16485 

44.6 

966.8 

14246 

138214 

3 

14 

4647 

2037 

31.5 

9.2 

3916 

12815 

65.1 

1032.2 

14542 

112642 

2 

15 

1008 

1969 

16.6 

10.3 

4006 

16704 

55.9 

935.5 

15953 

106646 

1 

16 

1519 

1950 

31.8 

10.5 

4094 

12545 

54.6 

906.0 

14684 

102816 

2 

17 

4326 

1832 

23.6 

7.3 

3064 

9976 

50.4 

867.2 

12107 

106482 

3 

18 

782 

1801 

28.4 

7.8 

3119 

8656 

70.5 

915.2 

12591 

113821 

4 

19 

4261 

1683 

48.6 

9.7 

3396 

7552 

65.3 

644.3 

10392 

112359 

4 

20 

4651 

1464 

38.8 

7.7 

3380 

8517 

67.4 

729.2 

10375 

116861 

4 

21 

2042 

1441 

24.5 

16.5 

4071 

10039 

51.9 

681.7 

10166 

116304 

3 

22 

4226 

1427 

38.1 

9.8 

3285 

5392 

67.8 

699.8 

10918 

91399 

4 

23 

1456 

1427 

46.7 

10.4 

2484 

8555 

56.8 

710.4 

10104 

63695 

2 

24 

2045 

1380 

37.2 

21.4 

1949 

8863 

50.7 

543.2 

7989 

89257 

3 

25 

2149 

1375 

29.8 

10.6 

2530 

8354 

48.4 

617.6 

9037 

68319 

2 

26 

1590 

1313 

30.1 

10.9 

2296 

9988 

50.4 

565.7 

8411 

67965 

1 

27 

27293 

1306 

25.3 

12.3 

2018 

6323 

57.4 

510.6 

7399 

99293 

4 

28 

3341 

1293 

35.8 

10.1 

2289 

7593 

59.9 

656.3 

9106 

81510 

2 

29 

9155 

1254 

53.8 

11.1 

2280 

6450 

60.1 

575.2 

7766 

107370 

4 

30 

1300 

1217 

47.6 

6.8 

2794 

4989 

69.0 

610.8 

9215 

76570 

4 

31 

3072 

1144 

68.0 

9.3 

2181 

7497 

56.0 

549.6 

7736 

61381 

2 

32 

1967 

1133 

51.1 

8.8 

2520 

8467 

45.8 

460.5 

7038 

69285 

3 

33 

3650 

1121 

34.6 

11.1 

2358 

6224 

62.9 

539.3 

7792 

77316 

4 

34 

2460 

1087 

49.6 

8.4 

1874 

7706 

59.9 

510.7 

6658 

62603 

2 

35 

2527 

1025 

78.7 

8.4 

1760 

7664 

46.5 

391.1 

5582 

62694 

3 

36 

2966 

970 

26.9 

10.3 

2053 

6604 

56.3 

450.4 

6966 

54854 

1 

37 

3434 

929 

28.9 

8.3 

1844 

3215 

65.1 

422.6 

5909 

72410 

4 

38 

1392 

883 

37.2 

9.8 

1579 

6087 

46.5 

396.8 

5705 

45642 

3 

39 

2298 

886 

76.2 

9.0 

1644 

7673 

48.2 

394.6 

5185 

52094 

3 

40 

1219 

864 

31.7 

20.6 

1396 

6158 

55.4 

352.8 

5879 

68109 

3 

41 

1708 

833 

24.0 

8.8 

1062 

5315 

56.2 

367.5 

5489 

52606 

2 

42 

8565 

822 

29.7 

7.3 

1604 

3485 

67.6 

349.3 

4655 

49111 

4 

43 

3358 

805 

35 . 1 

11.3 

1649 

5512 

44.9 

359.1 

4941 

42786 

3 

44 

2624 

794 

30.4 

12.2 

1532 

4730 

55.2 

356.5 

5094 

30771 

1 

45 

2187 

777 

47.0 

10.2 

1098 

4342 

51.9 

355.4 

5142 

46213 

2 

46 

3214 

774 

47.7 

9.4 

1285 

3459 

40.3 

401.7 

4924 

34941 

3 

47 

3491 

769 

48.5 

9 . 7 

1496 

5620 

59.6 

362.3 

4798 

44513 

3 

48 

4080 

773 

59.6 

9.9 

1597 

7496 

47.3 

380.9 

4600 

33936 

3 

49 

596 

723 

100.0 

6.0 

1260 

2819 

66.0 

319.9 

5181 

46984 

4 

50 

3199 

694 

80.6 

8.7 

983 

4749 

50.8 

292.4 

4127 

43010 

3 

51 

903 

661 

37.3 

9.6 

948 

4064 

55.6 

293.3 

4102 

34725 

2 

52 

2419 

647 

27.8 

9.9 

1250 

2870 

57.8 

286.8 

3860 

30829 

1 

53 

938 

644 

48.1 

7.4 

614 

3016 

50.0 

280.9 

4177 

35106 

2 

54 

1951 

629 

28.4 

14.5 

696 

4843 

47.9 

271.5 

3667 

14868 

1 

55 

1490 

624 

33.1 

11.9 

827 

3818 

47.4 

300.2 

4144 

19090 

1 

56 

5677 

610 

55.8 

10.5 

760 

3883 

56.2 

292.0 

4$35 

32146 

3 

57 

1525 

597 

55.7 

8.3 

751 

3234 

44.9 

318.5 

3777 

37070 

3 

58 

2528 

593 

19.2 

10.2 

798 

3135 

55.4 

274. 1 

3489 

44442 

3 

59 

312 

594 

19.5 

7.5 

769 

2463 

55.0 

298.7 

4352 

29100 

1 

60 

1537 

581 

63.8 

8.7 

1234 

5160 

62.7 

272.6 

3725 

32271 

2 

61 

1420 

576 

32.6 

9.5 

833 

2950 

54.0 

280.8 

3553 

26645 

2 

62 

4  7 

564 

41.9 

11.9 

745 

3352 

36.3 

258.9 

3915 

29157 

1 

63 

1023 

541 

35.1 

10.0 

639 

3144 

52.1 

234.  1 

3437 

22111 

2 

64 

2115 

526 

19.9 

9.1 

676 

2296 

38.8 

253.3 

2962 

30684 

3 

65 

1182 

514 

32.4 

7.4 

518 

2515 

52.4 

216.8 

3627 

35201 

2 

66 

1165 

516 

14.5 

8.6 

746 

4277 

54.4 

237.1 

3724 

31358 

3 

67 

476 

49  2 

8.9 

10.9 

787 

2778 

60.1 

218.4 

3603 

24787 

1 

68 

1553 

487 

50.0 

8.0 

2207 

4931 

52.0 

257.2 

2991 

24269 

3 

69 

2023 

477 

22.1 

21.8 

752 

2317 

55.7 

194.2 

3283 

36418 

3 

70 

2766 

474 

67.9 

7.7 

679 

3873 

56.3 

224.0 

2598 

29967 

3 

SMSA  data  set  (continued) 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

71 

5966 

472 

39.5 

9.6 

737 

1907 

52.7 

246.6 

3007 

38205 

4 

72 

1863 

468 

50.4 

7.7 

674 

2989 

63.8 

194.8 

2747 

25159 

4 

73 

192 

462 

60.5 

10.8 

617 

1789 

44. 1 

212.6 

3158 

27161 

1 

74 

9240 

455 

67.0 

10.3 

1123 

2347 

63.1 

183.6 

2598 

41649 

4 

75 

2277 

455 

39.5 

7.5 

512 

1788 

61.9 

221.1 

2853 

20053 

2 

76 

1630 

449 

41.9 

10.7 

724 

4395 

50.0 

198.0 

2445 

17596 

3 

77 

1617 

435 

71.0 

6.9 

518 

2031 

54.1 

197.9 

2617 

31539 

3 

78 

1057 

435 

90.7 

6 . 1 

479 

2551 

51.1 

163.4 

2012 

25650 

3 

79 

1624 

429 

13.4 

11.0 

832 

2938 

55.4 

207.8 

2885 

16985 

1 

80 

1676 

423 

36.6 

9.2 

505 

3297 

60.7 

156.3 

2689 

24266 

4 

81 

2818 

425 

48.5 

9.3 

540 

2694 

42.3 

172.8 

2162 

22374 

3 

82 

2866 

408 

24.9 

10.7 

427 

2864 

39.1 

169.1 

1987 

10425 

3 

83 

4883 

402 

72.4 

7.3 

873 

2236 

64.9 

185.2 

2353 

28171 

4 

84 

966 

401 

24.9 

10.6 

427 

3192 

52.2 

174.7 

2446 

15981 

2 

85 

2109 

403 

41.2 

10.3 

520 

2539 

45.2 

183.1 

2308 

16240 

3 

86 

2449 

395 

68.4 

9.6 

681 

2864 

63.2 

207.4 

2651 

25149 

2 

87 

2618 

385 

31.7 

6.1 

836 

2159 

48.0 

145.6 

1992 

25046 

3 

88 

1465 

374 

30.3 

6.8 

598 

6456 

50.6 

164.7 

2201 

26428 

3 

89 

1704 

375 

52.1 

10.5 

379 

2491 

55.6 

173.2 

2662 

18599 

2 

90 

1750 

370 

49.3 

9.7 

446 

3472 

58.2 

176.5 

2439 

16529 

2 

91 

1489 

369 

58.8 

9.5 

911 

5720 

56.5 

175.1 

2264 

26032 

3 

92 

8152 

363 

22.3 

9.1 

405 

1254 

51.7 

165.6 

2257 

28351 

4 

93 

2207 

364 

57.3 

9.7 

356 

2167 

45.5 

165.9 

2331 

19138 

3 

94 

7874 

360 

44.4 

6.9 

398 

1365 

65.2 

174.2 

2410 

33687 

4 

95 

655 

364 

75.2 

6.6 

425 

3879 

51.6 

163.0 

2088 

15623 

3 

96 

1803 

362 

35.3 

10.4 

483 

2137 

53.7 

168.9 

2666 

16405 

2 

97 

2363 

356 

53.1 

10.6 

565 

2717 

49.3 

146.4 

1996 

19212 

3 

98 

1435 

352 

13.4 

11.7 

342 

1076 

44.7 

156.8 

2165 

11273 

1 

99 

946 

348 

16.4 

11.1 

366 

1455 

43.9 

163.8 

2178 

8116 

1 

100 

1136 

333 

58.6 

9 . 7 

448 

2630 

68.1 

171.4 

2396 

20465 

2 

101 

2658 

327 

39.0 

12.2 

365 

5430 

49.9 

136.9 

1862 

9325 

1 

102 

228 

317 

31.1 

10.2 

667 

3179 

52.8 

156.5 

2264 

19410 

1 

103 

1758 

310 

56.8 

11.5 

565 

2081 

65.3 

131.2 

1939 

17379 

4 

104 

1198 

313 

55.1 

8.0 

1171 

3877 

71.2 

172.3 

2038 

18676 

2 

105 

1412 

311 

39.2 

11.3 

436 

1837 

49.4 

154.2 

2098 

25714 

4 

106 

2071 

306 

19.9 

11.3 

470 

2531 

58.9 

133.1 

1782 

11161 

1 

107 

862 

302 

26.3 

13.4 

423 

1929 

43.3 

145.5 

2010 

7699 

1 

108 

1526 

303 

71.7 

7.7 

413 

1636 

47.1 

125.8 

1692 

20038 

3 

109  • 

1758 

297 

33.2 

11.6 

296 

2652 

45.3 

114.4 

1641 

12467 

3 

110 

1651 

296 

64.6 

8.9 

774 

5431 

56.1 

136.9 

1724 

14468 

3 

111 

1493 

294 

64.8 

8.9 

863 

3289 

53.7 

154.7 

1787 

15871 

3 

112 

1610 

294 

59.8 

9.5 

471 

4633 

62.9 

116.1 

1851 

18651 

4 

113 

2710 

288 

63.7 

6.2 

357 

1277 

72.8 

110.9 

1639 

18173 

4 

114 

1975 

291 

46.5 

12.6 

405 

2896 

51.5 

133.8 

1853 

12787 

2 

115 

1920 

291 

49.8 

7.8 

283 

1306 

53.2 

126.9 

1553 

12315 

3 

116 

1404 

289 

38.5 

10.0 

299 

1766 

56.2 

138.6 

1776 

11715 

2 

117 

2737 

287 

45.0 

10.5 

602 

1462 

71.3 

131.4 

1980 

18208 

4 

118 

1700 

287 

18.8 

8.0 

739 

3381 

45.9 

120.4 

1616 

14534 

3 

119 

909 

277 

41.2 

11.5 

307 

1309 

54.2 

131.9 

1762 

13722 

2 

120 

1858 

277 

24.3 

13.7 

354 

1562 

46.3 

116.9 

1507 

19133 

3 

121 

3324 

275 

49.7 

8.4 

373 

929 

62.5 

120.5 

1918 

14776 

4 

122 

1697 

274 

23.8 

7.2 

338 

1610 

51.0 

105.9 

1354 

19317 

3 

123 

813 

272 

46.0 

9.8 

293 

1693 

58.4 

119.9 

1688 

10402 

1 

124 

7397 

267 

47.3 

12.5 

355 

2042 

56.2 

113.7 

1654 

12273 

2 

125 

1165 

268 

43.7 

9.4 

450 

2070 

57.5 

129.4 

1719 

16226 

2 

126 

802 

268 

52.6 

9.8 

392 

1425 

52.2 

129.6 

1816 

13230 

2 

127 

1770 

268 

14.8 

12.2 

285 

2804 

44.1 

106.7 

1537 

4205 

1 

128 

495 

264 

50.7 

7.8 

220 

1177 

52.6 

119.5 

1661 

8398 

2 

129 

1255 

261 

26.0 

10.7 

458 

1646 

51.6 

113.0 

1725 

10208 

3 

130 

1148 

589 

45.3 

11.1 

891 

5790 

54.0 

277.0 

3510 

29237 

1 

131 

1509 

643 

37.6 

12.0 

1087 

4900 

51.4 

319.6 

3982 

29058 

1 

132 

2013 

254 

61.7 

9.7 

273 

1484 

50.9 

106.7 

1412 

14446 

3 

133 

711 

250 

42.4 

6.1 

1411 

3659 

67.5 

131.0 

1790 

16228 

2 

134 

471 

251 

46.3 

8.6 

219 

1128 

47.8 

105.3 

1458 

13474 

2 

135 

4552 

249 

54.4 

9.1 

329 

719 

61.9 

118.0 

1386 

15596 

4 

136 

1400 

242 

50.8 

8.0 

290 

1271 

45.7 

104.4 

1351 

10391 

3 

137 

1511 

236 

38.7 

10.7 

348 

1093 

50.4 

127.2 

1452 

16676 

4 

138 

1543 

232 

39.6 

8.1 

159 

481 

30.3 

80.6 

769 

8436 

3 

139 

1011 

233 

37.8 

10.5 

264 

964 

70.7 

93.2 

1337 

14018 

3 

140 

813 

232 

13.4 

10.9 

371 

4355 

58.0 

97.0 

1589 

8428 

1 

141 

654 

231 

28.8 

3.9 

140 

1296 

55.1 

66.9 

1148 

15884 

3 

SMSA  data  set  /  541 


SMSA  data  set  (concluded) 


SMSA  Identifications 


1 

NEW  YORK, NY 

48 

NASHVILLE, TN 

95 

2 

LOS  ANGELES,  CA 

49 

HONOLULU,  HI 

96 

3 

CHICAGO, 1 L 

50 

JACKSONVILLE,  FL 

97 

4 

PH  1 LADELPH IA, PA 

51 

AKRON, OH 

98 

5 

DETROIT, Ml 

52 

SYRACUSE, NY 

99 

6 

SAN  FRANC  1  SCO, CA 

53 

GARY, 1 N 

100 

7 

WASHINGTON, DC 

54 

NORTHEAST,  PA 

101 

8 

NASSAU, NY 

55 

ALLENTOWN,  PA 

102 

9 

DALLAS, TX 

56 

TULSA, OK 

103 

10 

HOUSTON, TX 

57 

CHARLOTTE,  NC 

104 

1 1 

ST. LOU  IS, MO 

58 

ORLANDO, FL 

105 

12 

PITTSBURG, PA 

59 

NEW  BRUNSWICK, NJ 

106 

13 

BALT  1  MORE, MD 

60 

OMAHA, NE 

107 

14 

MINNEAPOLIS, MN 

61 

GRAND  RAPIDS, Ml 

108 

15 

NEWARK,  NJ 

62 

JERSEY  CITY, NJ 

109 

16 

CLEVELAND,  OH 

63 

YOUNGSTOWN,  OH 

no 

17 

ATLANTA, GA 

64 

GREENVILLE,  SC 

111 

18 

ANAHE 1 M, CA 

65 

FLINT, Ml 

112 

19 

SAN  D 1  EGO, CA 

66 

WILMINGTON, DE 

113 

20 

DENVER, CO 

67 

LONG  BRANCH, NJ 

114 

21 

Ml  AMI , FL 

68 

RALEIGH, NC 

115 

22 

SEATTLE, WA 

69 

W.  PALM  BEACH, FL 

116 

23 

M 1 LWAUKEE, W 1 

70 

AUST 1 N, TX 

117 

24 

TAMPA, FL 

71 

FRESNO, CA 

118 

25 

CINCINNATI , OH 

72 

OXNARD, CA 

119 

26 

BUFFALO, NY 

73 

PATERSON, NJ 

120 

27 

RIVERSIDE, CA 

74 

TUCSON, AZ 

121 

28 

KANSAS  CITY, MO 

75 

LANSI NG,MI 

122 

29 

PHOEN IX, AZ 

76 

KNOXV 1 LLE, TN 

123 

30 

SAN  JOSE, CA 

77 

BATON  ROUGE, LA 

124 

31 

INDIANAPOLIS,  IN 

78 

EL  PASO, TX 

125 

32 

NEW  ORLEANS, LA 

79 

HARRISBURG, PA 

126 

33 

PORTLAND, OR 

80 

TACOMA, WA 

127 

34 

COLUMBUS,  OH 

81 

MOBI LE, AL 

128 

35 

SAN  ANTON  10, TX 

82 

JOHNSON  C 1 TY, TN 

129 

36 

ROCHESTER,  NY 

83 

ALBUQUERQUE,  NM 

130 

37 

SACRAMENTO,  CA 

84 

CANTON, OH 

131 

38 

LOU  1 SVI LLE, KY 

85 

CHATANOOGA, TN 

132 

39 

MEMPH 1 S,  TN 

86 

WICHITA, KS 

133 

40 

FT.  LAUDERDALE, FL 

87 

CHARLESTON,  SC 

134 

41 

DAYTON, OH 

88 

COLUMBIA,  SC 

135 

42 

SALT  LAKE  CITY,UT 

89 

DAVENPORT,  IA 

136 

43 

B  1  RM 1 NGHAM, AL 

90 

FORT  WAYNE,  1  N 

137 

44 

ALBANY, NY 

91 

LITTLE  ROCK, AR 

138 

45 

TOLEDO, OH 

92 

BAKERSF 1  ELD,  CA 

139 

46 

GREENSBORO, NC 

93 

BEAUMONT, TX 

140 

47 

OKLAHOMA  CITY, OK 

94 

LAS  VEGAS, NV 

141 

NEWPORT  NEWS, VA 
PEORIA, I L 
SHREVEPORT, LA 
YORK, PA 
LANCASTER, PA 
DES  MOINES,  I A 
UTICA, NY 
TRENTON, NJ 
SPOKANE, WA 
MAD  I  SON, W I 
STOCKTON, CA 
BINGHAMTON, NY 
READING, PA 
CORPUS  CHR ! ST  I , TX 
HUNT  I NGTON, WV 
JACKSON, MS 
LEXINGTON, KY 
VALLEJO, CA 
COLORADO  SPRINGS, CO 
EVANSVILLE, IN 
HUNTSV I LLE, AL 
APPLETON, Wl 
SANTA  BARBARA, CA 
AUGUSTA, GA 
SOUTH  BEND,  I  N 
LAKELAND, FL 
SALI NAS, CA 
PENSACOLA, FL 
ERIE, PA 
DULUTH, MN 
KALAMAZOO, Ml 
ROCKFORD, I L 
JOHNSTOWN, PA 
LORA  IN, OH 
CHARLESTON, WV 
SPRINGFI ELD,MA 
WORCESTER, MA 
MONTGOMERY, AL 
ANN  ARBOR, Ml 
HAMILTON, OH 
EUGENE, OR 
MACON, GA 
MODESTO, CA 
MCALLEN, TX 
MELBOURNE, FL 
POUGHKEEPSIE, NY 
FAYETTEV I LLE, NC 


Index 


A 

Addition  theorem,  2 

Adjusted  coefficient  of  multiple  determination, 
241-42 

All-possible-regressions  selection  procedure, 
421-29 

Allocated  codes,  351-52 
Analysis  of  variance,  84-86 
Analysis  of  variance  models,  343 
Analysis  of  variance  table,  89-90 
ANOVA  table,  89-90 
Asymptotic  normality,  70 
Autocorrelation,  444-48 
remedial  measures,  454-60 
test  for,  450-54 
Autocorrelation  parameter,  448 
Autoregressive  error  model;  see  Regression  model 

B 

Backward  elimination  selection  procedure,  435- 
36 

Berkson  model,  166-67 
“Best”  subsets  algorithms,  429 
Beta  coefficient,  262 
Biased  estimation,  394-95 
Binary  variable,  330,  354;  see  also  Indicator 
variable 

Bivariate  normal  distribution,  492-96 
BMDP,  113,  251,  431 
Bonferroni  joint  estimation  procedure 
for  inverse  predictions,  174 
for  mean  responses,  158-59,  245 


for  prediction  of  new  observations,  159-60, 
246-47 

for  regression  coefficients,  150-54,  243 

C 

Cp  criterion,  426-28 
Calibration  problem,  174 
Central  limit  theorem,  6 
Chi-square  distribution,  7-8 
table  of  percentiles,  520 
Cochran’s  theorem,  92 
Coefficient  of  multiple  correlation,  242,  506 
inferences,  506-7 

Coefficient  of  multiple  determination,  241,  506 
adjusted,  241-42 
inferences,  506-7 

Coefficient  of  partial  correlation,  288-89,  507-8 
first-order,  508 
inferences,  508 
second-order,  508 

Coefficient  of  partial  determination,  286-88, 
507-8 

inferences,  508 

Coefficient  of  simple  correlation,  97-99,  494 
inferences,  502-4 

Coefficient  of  simple  determination,  96-97, 
501-2 

inferences,  502-4 
Column  vector,  187 
Complementary  event,  3 
Conditional  probability,  3 


543 


544  /  Index 


Conditional  probability  function,  4 
Confidence  coefficient,  interpretation  of,  70-71, 
84 

Confidence  set,  152 
Consistent  estimator,  9 
Contour  diagram,  235-37,  494-95 
Cook’s  distance  measure,  407-9 
Correction  for  mean  sum  of  squares,  90 
Correlation  coefficient;  see  Coefficient  of  multiple 
correlation;  Coefficient  of  partial  correlation; 
and  Coefficient  of  simple  correlation 
Correlation  index,  312 
Correlation  matrix,  382 

of  the  independent  variables,  381 
Correlation  model,  491-92 
bivariate  normal,  492-96 
multivariate  normal,  505 
Correlation  transformation,  378-79 
Covariance 

of  two  functions  of  random  variables,  6 
of  two  random  variables,  5 
Covariance  models,  343 
Cox,  D.  R. ,  176 

D 

Degrees  of  freedom,  7-8 
Deleted  residual,  405-6 
Denominator  degrees  of  freedom,  8 
Dependent  variable,  25,  28 
Determinant  of  matrix,  202 
Diagonal  matrix,  196-97 
Disturbance  term,  445 

Dummy  variable,  34,  330;  see  also  Indicator 
variable 

Durbin-Watson  test,  450-54 
table  of  test  bounds,  530-31 

E 

Error  mean  square,  47 
Error  sum  of  squares,  47 
Error  term,  31 

nonconstancy  of  error  variance,  113-14,  123, 
133 

nonindependence  of,  116-18,  123,  133 
nonnormality  of,  118-20,  123,  133 
Error  term  variance,  31,  46-48,  50 
Expected  mean  square,  90-91 
Expected  value 

of  function  of  random  variables,  5-6 
of  random  variable,  3 
Experimental  data,  35 
Exponential  regression  function,  468 
Extra  sum  of  squares,  282-86 

F 

F  distribution,  8-9 

table  of  percentiles,  521-27 
Family  of  estimates,  150 
Family  confidence  coefficient,  150 


First  differences,  458-59 
First-order  autoregressive  error  model,  448-50 
first  differences  approach,  458-60 
iterative  estimation  approach,  455-58 
test  for  autocorrelation,  450-54 
First-order  regression  model,  31,  227,  229-30; 

see  also  Regression  model 
Fisher,  R.  A.,  503 
Fitted  value,  41 

in  terms  of  hat  matrix,  401 
Forward  selection  procedure,  435 
Full  model,  95 
Functional  relation,  24 

G-H 

Gauss-Markov  theorem,  39-40,  64 
Gauss-Newton  method,  472-79 
General  linear  regression  model,  230-34,  237-38 
General  linear  test,  94-96,  293-96 

Elat  matrix,  220-21 
Eleteroscedasticity,  170 
Homoscedasticity,  170 
Hyperplane,  230 

I 

Idempotent  matrix,  221 
Identity  matrix,  197 
Independence  of  random  variables,  5 
Independent  variable,  25,  28 
Indicator  variable,  329-30,  353-54 
in  comparing  regression  functions,  343-45 
as  dependent  variable,  354-57 
in  piecewise  linear  regression,  346-50 
in  time  series  model,  350-51 
Influential  observations,  407-9 
Instrumental  variable,  165-66 
Interaction  effect,  232-37 
with  indicator  variables,  335-39 
Interaction  effect  coefficient,  304 
Intrinsically  linear  regression  model,  467 
Inverse  of  matrix,  200-204 
Inverse  prediction,  172-74 

J-L 

Joint  confidence  region  for  regression 
coefficients,  147-50,  217,  243 
Joint  probability  function,  4 

Lack  of  fit  mean  square,  129 

Lack  of  fit  sum  of  squares,  128 

Lack  of  fit  test,  123-32,  245-46 

Least  absolute  deviations  estimation,  410-11 

Least  squares  criterion,  36 

Least  squares  estimation,  10 

control  of  roundoff  errors,  377-82 
multiple  regression,  238-39 
nonlinear  regression,  470-80 
simple  linear  regression,  36-40,  44-46,  210- 
12 

weighted,  167-72,  219-20,  263 


Index  /  545 


Leverage,  402 
Likelihood  function,  9 
Linear  dependence,  199-200 
Linear  effect  coefficient,  301 
Linear  model,  31,  466-67;  see  also  General 
linear  regression  model 
general  linear  test,  94-96,  293-96 
Linear  regression  model;  see  Regression  model 
Linearity,  test  for,  123-32 
Logistic  regression  function,  361-62,  468-69 
Logistic  transformation,  362 
Logit  transformation,  362 

M 

Marginal  probability  function,  4 
Marquardt  algorithm,  479-80 
Matrix 

addition,  190-91 
with  all  elements  1,  198 
definition,  185-87 
determinant,  202 
diagonal,  196-97 
dimension,  186 
elements,  186 
equality  of  two,  189 
hat,  220-21 
idempotent,  221 
identity,  197 
inverse,  200-204 
multiplication  by  matrix,  192-96 
multiplication  by  scalar,  192 
nonsingular,  201 
of  quadratic  form,  215 
random,  205-8 
rank,  200 
scalar,  197-98 
singular,  201 
square,  187 
subtraction,  190-91 
symmetric,  196 
theorems,  204-5 
transpose,  188-89 
vector,  187-88 
zero  vector,  198-99 
Maximum  likelihood  estimation,  9-10 
of  regression  parameters,  50-51 
Mean,  of  population 
estimation  of 

difference  between  two,  14-16 
single,  11 
test  concerning 

difference  between  two,  14-16 
single,  11-12 
Mean  response,  41 
multiple  regression 
estimation,  244 
joint  estimation,  245 
simple  linear  regression 
interval  estimation,  75-76,  217 


Mean  response — Cont. 
simple  linear  regression — Cont. 
joint  estimation,  157-59 
point  estimation,  41-43 
Mean  square,  46,  88 
expected  value  of,  90-91 
Mean  squared  error 

of  regression  coefficient,  395 
total,  of  n  fitted  values,  426 
Measurement  errors  in  observations,  164-67 
Method  of  steepest  descent,  479 
Minimum  absolute  deviations  method,  411 
Minimum'!, ! -norm  method,  411 
Minimum  sum  of  absolute  deviations  method,  411 
Minimum  variance  estimator,  9 
MSEP  criterion,  423-25 
Multicollinearity,  271-78,  382-90 
detectiqn  of,  390-93 
remedial  measures,  393-400 
Multiple  correlation;  see  Coefficient  of  multiple 
correlation 

Multiple  regression;  see  Mean  response; 

Prediction  of  new  observation;  Regression 
coefficients;  Regression  function;  Regression 
model;  and  Selection  of  independent 
variables 

Multiplication  theorem,  3 
Multivariate  normal  distribution,  505 

N 

Noncentrality  parameter,  71 
Nonexperimental  data,  35 
Nonlinear  regression  model,  468-69 
inferences  about  parameters,  480-83 
least  squares  estimation,  470-80 
Nonsingular  matrix,  201 
Normal  equations,  38 

Normal  error  regression  model;  see  Regression 
model 

Normal  probability  distribution,  6-7 
table  of  areas  and  percentiles,  517 
Normal  probability  plot,  118-20 
Numerator  degrees  of  freedom,  8 

O 

Observation,  25 
Observational  data,  35 
Observed  value,  41 
Orthogonal  polynomials,  319 
Outlier,  114-16,  123 
identification  of,  400-407 
Overall  F  test,  281,  289 

P 

P-value,  12-13 
Paired  observations,  15-16 
Partial  correlation;  see  Coefficient  of  partial 
correlation 

Partial  F  test,  281,  289 


546  /  Index 


Partial  regression  coefficient,  229 
Piecewise  linear  regression,  346-50 
Point  estimator,  38 

Polynomial  regression  model,  300-305 
Power  of  tests  for  regression  coefficients,  71-72 
Prediction,  of  new  observation 
inverse,  172-74 
multiple  regression,  246-47 
simple  linear  regression,  76-82,  159-60, 
218-19 

Prediction  bias,  437 
Prediction  interval,  77-78 
Predictor  variable,  25,  28 
Probit  transformation,  366 
Product  operator,  2 
Pure  error  mean  square,  127-28 
Pure  error  sum  of  squares,  127 

Q 

Quadratic  effect  coefficient,  301 
Quadratic  form,  215 
Quadratic  response  function,  301 
estimation  of  maximum  or  minimum,  317-19 
Quantal  response,  354 

R 

Ra  criterion,  423-25 
Rp  criterion,  422-23 
Random  matrix,  205-8 
Random  vector,  205-8 
Rank  of  matrix,  200 
Reduced  model,  95 

Regression;  see  Mean  response;  Prediction  of  new 
observation;  Regression  coefficients; 
Regression  function;  and  Regression  model 
Regression,  through  origin,  160-64 
Regression  coefficients 
multiple  regression,  227-29 

danger  in  simultaneous  tests,  278-82 
interval  estimation,  243 
joint  estimation,  243 
point  estimation,  238-39,  263 
tests  concerning,  243,  285-86,  289-93 
variance-covariance  matrix  of,  242,  263 
partial,  229 

simple  linear  regression,  33-34 
interval  estimation,  65-67,  69-70 
joint  estimation,  147-54,  217 
point  estimation,  36-40,  50-51,  167-72, 
210-12,  219-20 

tests  concerning,  67-68,  71-72,  92-94 
variance-covariance  matrix  of,  216-17 
standardized,  261-63 

Regression  curve,  27-28;  see  also  Regression 
function 

Regression  function,  27-28 

comparison  of  two  or  more,  343-45 
confidence  band,  simple  linear  regression, 
154-57 


Regression  function — Cont. 

confidence  region,  multiple  regression,  244 
estimated  regression  function,  41-43 
test  for  fit,  123-32,  245-46 
test  for  regression  relation,  92-93,  240-41 
transformations  to  linearize,  134-41 
Regression  mean  square,  88 
Regression  model,  26-29 
effect  of  measurement  errors,  164-67 
first-order  autoregressive,  448-50 
general  linear,  230-34,  237-38 
multiple,  226-30 

with  interaction  effects,  232-37,  335-39 
in  matrix  terms,  237-38 
nonlinear,  468-69 
polynomial,  300-305 
residual  analysis  for  aptness,  111-22 
scope  of,  30,  261 
second-order  autoregressive,  460 
selection  of  independent  variables 
all  possible  regressions,  421-29 
backward  elimination,  435-36 
forward  selection,  435 
ridge  regression,  436 
stepwise  regression,  430-35 
simple  linear 

error  term  distribution  unspecified,  31-35 
in  matrix  terms,  208-10 
normal  error  terms,  48-49 
through  origin,  160-64 
X  is  random,  83-84 
uses  of,  30-31 

Regression  sum  of  squares,  86 
decompositions,  284-86 
Regression  surface,  227-28 
Replication,  124 
Residual,  43-44 
deleted,  405-6 
properties  of,  110-11 
standardized,  110 
studentized,  405 
studentized  deleted,  406 
in  terms  of  hat  matrix,  220-21 
variance-covariance  matrix  of,  221,  402 
Residual  analysis,  111-22 
Residual  mean  square,  47 
Residual  plot,  111-13 
Residual  sum  of  squares,  47 
Response,  41 

binary  or  quantal,  354 
Response  function;  see  Regression  function 
Response  surface,  227-28 
Response  variable,  25,  28 
Restricted  model,  95 
Ridge  regression,  394-400 
use  for  selecting  independent  variables,  436 
Ridge  trace,  396-97 

Roundoff  errors  in  least  squares  calculations, 
377-78 

Row  vector,  188 


Index  /  547 


S 

SAS,  259 
Scalar,  192 
Scalar  matrix,  197-98 
Scatter  diagram,  25 
Scatter  plot,  25 

Scheffe  joint  estimation  procedure 
for  inverse  predictions,  174 
for  prediction  of  new  observations,  159-60, 
246 

Scope  of  model,  30,  261 

Second-order  autoregressive  error  model,  460 

Second-order  regression  model,  300-301,  303-5; 

see  also  Regression  model 
Selection  of  independent  variables,  29,  417-19 
SENIC  data  set,  533-36 
Serial  correlation,  444 

Simple  linear  regression  model,  31;  see  also 
Regression  model 
Singular  matrix,  201 
SMSA  data  set,  537-41 
SPSS,  52 

Square  matrix,  187 

Standard  normal  variable,  7 

Standardized  regression  coefficient,  261-63 

Standardized  residual,  110 

Statement  confidence  coefficient,  150 

Statistical  relation,  25-26 

Stepwise  regression  selection  procedure,  430-35 

Studentized  deleted  residual,  406 

Studentized  residual,  405 

Sufficient  estimator,  9 

Sum  of  squares,  46 

as  quadratic  form,  215-16 
Summation  operator,  1-2 
Symmetric  matrix,  196 

T-U 

t  distribution,  8 

table  of  percentiles,  518-19 
t  test  power  function  charts,  528-29 
Third-order  regression  model,  302;  see  also 
Regression  model 


Total  deviation,  87 
Total  sum  of  squares,  85 
Total  uncorrected  sum  of  squares,  90 
Transformations  of  variables,  134-41 
Transpose  of  matrix,  188-89 
Trial,  25 

Unbiased  estimator,  9 
Unrestricted  model,  95 

y 

Variance 

of  error  term,  31,  46-48,  50 
estimation  of 

ratio  of  two,  17-18 
single,  16 

of  function  of  random  variables,  5-6 
of  random  variable,  4 
test  concerning 

ratio  of  two,  18-19 
single,  16-17 

Variance-covariance  matrix,  206-7 

of  regression  coefficients,  216-17,  220,  242, 
263 

of  residuals,  221,  402 
Variance  inflation  factor,  391-93 
Vector,  187-88 
with  all  elements  0,  198-99 
with  all  elements  1,  198 
random,  205-8 

W-Z 

Weighted  least  squares,  167-72, 

219-20,  263 

Westwood  Company,  25-26 
Working-Hotelling  confidence  region 
multiple  regression,  244 
simple  linear  regression,  154-58 
use  for  joint  estimation  of  mean  responses, 
157-58,  245 

z'  transformation,  503 
table,  532 

Zarthan  Company,  247 


This  book  has  been  set  CRT  10  and  9  point  Times 
Roman,  leaded  2  points.  Part  numbers  are  16  point 
Helvetica  Bold  and  part  titles  are  20  point  Helvetica 
Bold;  chapter  numbers  are  20  point  Helvetica  Bold 
and  chapter  titles  are  18  point  Helvetica  Bold.  The 
size  of  the  type  page  is  30  picas  by  47  picas. 


